Indexing inconsistency troubleshooting

Still need help?

The Atlassian Community is here for you.

Ask the community

JIRA 9 JIRA DC 

In Jira DC nodes share their indexes via shared home. What triggers the creation of index snapshot and how it is being used changed across Jira versions.

Legacy mode

Until JIRA 8.19 every starting node would request creating an index snapshot from any existing node in the cluster. This mode requires that a new node can join the cluster only if all existing nodes have a proper index at the time a new node joins the cluster. There are many things which can go wrong in this scenario, like:

  • the state of the cluster is not up to date and there is no other node which can provide the index
  • the node which handles the request of delivering the index has a faulty index
  • the node which handles the request of delivering the index fails to create the index snapshot
  • the node which handles the request of delivering the index fails to inform the starting node that the index snapshot was created

and other potential problems which can result in:  JRASERVER-72125 - Getting issue details... STATUS

Index snapshot - ready on start

JIRA 8.20

In JIRA 8.19 we have introduced a new way of getting an index snapshot for new nodes. When a new node starts it looks for an index snapshot in shared home. If this snapshot is fresh enough it will restore its index based on this snapshot. Since JIRA 8.19 a random node will produce an index snapshot every 24 hours (by default). 

If the starting node fails to get the snapshot from shared home (no snapshot or the snapshot is not fresh enough) it falls back to legacy mode.

With this change the chance of running into  JRASERVER-72125 - Getting issue details... STATUS  was greatly reduced. However it is still possible that the index snapshot created by the scheduler is inconsistent (example: scheduler runs on a node where the index is currently not consistent).

Index snapshot - quality guaranteed

JIRA 9.0

In Jira 9.0 we have made couple of changes to guarantee the quality of the index on shared home.

Index snapshot - location

All index snapshot now use the same file naming scheme regardless of their location:

IndexSnapshot_<unique_number>_<yyMMdd-HHmmss>.<tar.sz|tar|zip>

The index file and snapshot locations have also changed:

  • <local_home_directory>/caches/indexesV2  stores index files
  • <shared_home_directory>/caches/indexesV2/snapshots  stores index snapshots that were:
    • created by scheduled index backups
    • retrieved by nodes joining the cluster
    • used for snapshot recovery
    • replicated to the secondary home directory
  • <shared_home_directory>/caches/indexesV2/snapshots  stores index snapshots created:
    • on the completion of a full reindex and retrieved by other nodes on reindex detection
    • when a new node joined the cluster
    • on administrator request
    • on data import
Index snapshot - quality

Before creating (and sending) an index snapshot to shared home the node will always check if the index is consistent. If the index is not consistent the operation will not be performed and this will be only visible in the logs of the node which was requested to create the index snapshot:

Example log message: any time a node is requested to create an index snapshot and fails the index consistency check
ERROR Index backup failed. Index backup can be done only on consistent index.
Example log message: node1 requested an index snapshot from node2
ERROR Note that node: [node1] is waiting for an index and failed to restore the index from shared and from this node
      This state require admin action, Both nodes: [node1] and [node2], must obtain a consistent index.
      Please check KB: https://confluence.atlassian.com/x/OYNyQg to find out how can you solve this problem.
How to make sure there is a consistent index snapshot on shared home
Full reindex

Running the full-reindex on any node will trigger creating an index snapshot and send it to shared home. 

Index copy

If there is a node in the cluster which contains a consistent index, copying this index to any other node via the admin panel (Admin/System/Indexing/Copy the Search Index from another node) will result in creating an index snapshot on shared home.

With 9.0 changes the chance of running into  JRASERVER-72125 - Getting issue details... STATUS  should be even lower. 

Please make sure that in the process of starting new nodes you include a check that an index snapshot is available in shared home:

  • make sure that index snapshots are created by the scheduler
  • any operation triggering large indexing (example: project import) should be followed by creating an index snapshot
Index Analyzer

When a small number of issues is affected, Jira's index analyzer can list the issues and can fix them in a specific node. Check How to use Jira's index analyzer to fix index inconsistencies

Troubleshooting

Please use the following grep across all nodes logs to see log messages related to indexing and index management:

grep 'IndexUtils\|ArchiveUtils\|DefaultIssueIndexer\|DefaultClusterManager\|DefaultIndexCopyService\|DefaultNodeReindexService\|SnapshotDeletionPolicyContributionStrategy\|DefaultIndexManager' atlassian-jira.log

Q&A

How Jira updates the index with changes done after the index snapshot was created?

Every time the index snapshot is restored (few hours old from snapshot of "just" created from another node) we will run an "index-fixer" after restoring this snapshot. This is not blocking users from accessing this node (/status may return status that node is running) so may happen in background.

In JIRA 8.20 we are still running 2 index fixers:

  • legacy-fixer: which is using the max issue update time from DB vs max issue update time from restored index: based on this it will reindex all issues in this time range

  • new version based fixer: this one will try to use the version table to determine which issues (and related entities) need to be re-indexed (or deleted from index)

In JIRA 9.3 we removed the legacy fixer as it is not needed anymore since all entities have versions.

To see all logs related to fixing the index after restoring it from snapshot please grep the log with: [INDEX-FIXER]

How do we calculate the time range on which should run?

If the index has meta information with timestamp we will use this as time range start (only snapshots created with full-foreground reindex have this timestamp) and getting max issue update time from DB (time range end).

If the index has no meta information with timestamp we will use max issue update time from index (time range start) and get max issue update time from DB (time range end).




Last modified on Mar 21, 2024

Was this helpful?

Yes
No
Provide feedback about this article
Powered by Confluence and Scroll Viewport.