Jira Health Check shows the message Index replication for cluster node "node" is behind by "number" seconds
Platform Notice: Data Center - This article applies to Atlassian products on the Data Center platform.
Note that this knowledge base article was created for the Data Center version of the product. Data Center knowledge base articles for non-Data Center-specific features may also work for Server versions of the product, however they have not been tested. Support for Server* products ended on February 15th 2024. If you are running a Server product, you can visit the Atlassian Server end of support announcement to review your migration options.
*Except Fisheye and Crucible
Jira 9.1 update
The described mechanism of requesting a snapshot of indexes from another node has been turned off in Jira 9.1.
Summary
Cluster Index Replication health check shows the following message:
Index replication for cluster node <<node>> is behind by <<number>> seconds.
Diagnosis
1) Health Check shows that index replication is behind on one or more nodes:
Name: Cluster Index Replication
NodeId: null
Is healthy: false
Failure reason: Index replication for cluster node 'node05' is behind by 763 seconds.
Severity: CRITICAL
2) We can observe the following sequence of events when analyzing atlassian-jira.log:
a) Just after starting the node, it asks for a fresh snapshot from another node:
2021-04-01 12:26:21,314-0500 localhost-startStop-1 INFO [c.a.jira.startup.ClusteringLauncher] Checking local index on node start
2021-04-01 12:26:21,317-0500 localhost-startStop-1 INFO [c.a.jira.cluster.DefaultClusterManager] Current node: node05 index can't be rebuilt. Requesting an index from any other node. Current list of other nodes: [node01, node04, node03, node02]
b) While waiting for a snapshot from another node, the index service is paused:
2021-04-01 12:26:21,317-0500 localhost-startStop-1 INFO [c.a.j.index.ha.DefaultNodeReindexService] [INDEX-REPLAY] Pausing node re-index service
java.lang.Exception
at com.atlassian.jira.index.ha.DefaultNodeReindexService.pause(DefaultNodeReindexService.java:213)
at com.atlassian.jira.cluster.DefaultClusterManager.requestCurrentIndexFromNode(DefaultClusterManager.java:138)
2021-04-01 12:26:21,323-0500 localhost-startStop-1 INFO [c.a.jira.cluster.DefaultClusterManager] Sending message: "Backup Index" - request to create index snapshot from node: ANY on current node: node05
c) However, the sending node fails to provide an index snapshot for any reason (i.e. due to
JRASERVER-62669
-
Automatic restore of indexes will fail if the node that registered the latest index operation is unavailable
Closed
), and the service remains paused:
2021-04-15 13:35:28,475-0500 NodeReindexServiceThread:thread-0 INFO [c.a.j.index.ha.DefaultNodeReindexService] [INDEX-REPLAY] Node re-index service is not running: currentNode.isClustered=true, notRunningCounter=242748, paused=true, lastPausedStacktrace=java.lang.Throwable
at com.atlassian.jira.index.ha.DefaultNodeReindexService.pause(DefaultNodeReindexService.java:215)
at com.atlassian.jira.cluster.DefaultClusterManager.requestCurrentIndexFromNode(DefaultClusterManager.java:138)
at com.atlassian.jira.cluster.DefaultClusterManager.checkIndex(DefaultClusterManager.java:131)
at com.atlassian.jira.startup.ClusteringLauncher.start(ClusteringLauncher.java:37)
at com.atlassian.jira.startup.DefaultJiraLauncher.postDBActivated(DefaultJiraLauncher.java:168)
at com.atlassian.jira.startup.DefaultJiraLauncher.lambda$postDbLaunch$2(DefaultJiraLauncher.java:146)
at com.atlassian.jira.config.database.DatabaseConfigurationManagerImpl.doNowOrEnqueue(DatabaseConfigurationManagerImpl.java:301)
at com.atlassian.jira.config.database.DatabaseConfigurationManagerImpl.doNowOrWhenDatabaseActivated(DatabaseConfigurationManagerImpl.java:196)
at com.atlassian.jira.startup.DefaultJiraLauncher.postDbLaunch(DefaultJiraLauncher.java:137)
at com.atlassian.jira.startup.DefaultJiraLauncher.lambda$start$0(DefaultJiraLauncher.java:104)
at com.atlassian.jira.util.devspeed.JiraDevSpeedTimer.run(JiraDevSpeedTimer.java:31)
at com.atlassian.jira.startup.DefaultJiraLauncher.start(DefaultJiraLauncher.java:102)
at com.atlassian.jira.startup.LauncherContextListener.initSlowStuff(LauncherContextListener.java:154)
at com.atlassian.jira.startup.LauncherContextListener.initSlowStuffInBackground(LauncherContextListener.java:139)
at com.atlassian.jira.startup.LauncherContextListener.contextInitialized(LauncherContextListener.java:101)
at org.apache.catalina.core.StandardContext.listenerStart(StandardContext.java:4689)
at org.apache.catalina.core.StandardContext.startInternal(StandardContext.java:5155)
at org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:183)
at org.apache.catalina.core.ContainerBase$StartChild.call(ContainerBase.java:1412)
at org.apache.catalina.core.ContainerBase$StartChild.call(ContainerBase.java:1402)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Cause
Jira pauses the cluster index replication service when requesting an index snapshot from another node. If the sending node fails to provide a snapshot for any reason, the cluster index replication service will remain paused indefinitely.
Workaround
You can use one of the following options:
Option 1) Restart again the node
When starting the node, it will send a new request to get a snapshot from another node.
Option 2) Manually copy the index snapshot from another node
- Sign in on the problematic node
- In Jira Admin, go to > System > Indexing (under Advanced).
- At the bottom of the page, choose the source node and copy the index.
Option 3) Restore an index snapshot from a backup
- If the index backup is enabled, the index snapshots will be at the <yourJirahome>/exports/export/indexsnapshots directory.
- Navigate to Administration () > System
- Select Advanced > Indexing to open the Indexing page
- Enter the name of the previously saved index in the File name and click Recover.
- Jira will not be available during the recovery of the index.
- If changes were made to the configuration that required a re-index after the snapshot was taken, then you will need to do a background re-index after the recovery. Note, Jira will be available after the recovery.
Note:
Background re-index is very slow on recent Jira Versions due to this bug: JRASERVER-72045 - IndexException: Wait attempt timed out - waited 30000 milliseconds caused by background indexing tasks and documented on this Knowledge Base Background reindex is slow after upgrading to Jira 8.10 and later. Thus, if you are aware of multiple changes that could be in place since the previous backup, the ideal is to copy from another node OR do a full reindex, running on this node.
References:
JRASERVER-72125
-
Index replication service is paused indefinitely after failing to obtain an index snapshot from another node
Closed
JRASERVER-66970
-
/status should indicate when indexes are broken on a node
Closed
JRASERVER-62669
-
Automatic restore of indexes will fail if the node that registered the latest index operation is unavailable
Closed