Jira Health Check shows the message Index replication for cluster node "node" is behind by "number" seconds

Still need help?

The Atlassian Community is here for you.

Ask the community

For Atlassian eyes only

This article is Not Validated and cannot be shared with customers.

Platform Notice: Data Center Only - This article only applies to Atlassian products on the Data Center platform.

Note that this KB was created for the Data Center version of the product. Data Center KBs for non-Data-Center-specific features may also work for Server versions of the product, however they have not been tested. Support for Server* products ended on February 15th 2024. If you are running a Server product, you can visit the Atlassian Server end of support announcement to review your migration options.

*Except Fisheye and Crucible

Jira 9.1 update

The described mechanism of requesting a snapshot of indexes from another node has been turned off in Jira 9.1.

Learn more

Summary

Cluster Index Replication health check shows the following message:

Index replication for cluster node <<node>> is behind by <<number>> seconds.

Diagnosis

1) Health Check shows that index replication is behind on one or more nodes:

Name: Cluster Index Replication
NodeId: null
Is healthy: false
Failure reason: Index replication for cluster node 'node05' is behind by 763 seconds.
Severity: CRITICAL


2) We can observe the following sequence of events when analyzing atlassian-jira.log:

a) Just after starting the node, it asks for a fresh snapshot from another node:

2021-04-01 12:26:21,314-0500 localhost-startStop-1 INFO [c.a.jira.startup.ClusteringLauncher] Checking local index on node start
2021-04-01 12:26:21,317-0500 localhost-startStop-1 INFO [c.a.jira.cluster.DefaultClusterManager] Current node: node05 index can't be rebuilt. Requesting an index from any other node. Current list of other nodes: [node01, node04, node03, node02]

b) While waiting for a snapshot from another node, the index service is paused:

2021-04-01 12:26:21,317-0500 localhost-startStop-1 INFO [c.a.j.index.ha.DefaultNodeReindexService] [INDEX-REPLAY] Pausing node re-index service
java.lang.Exception
at com.atlassian.jira.index.ha.DefaultNodeReindexService.pause(DefaultNodeReindexService.java:213)
at com.atlassian.jira.cluster.DefaultClusterManager.requestCurrentIndexFromNode(DefaultClusterManager.java:138)
2021-04-01 12:26:21,323-0500 localhost-startStop-1 INFO [c.a.jira.cluster.DefaultClusterManager] Sending message: "Backup Index" - request to create index snapshot from node: ANY on current node: node05

c) However, the sending node fails to provide an index snapshot for any reason (i.e. due to  JRASERVER-62669 - Getting issue details... STATUS ), and the service remains paused:

2021-04-15 13:35:28,475-0500 NodeReindexServiceThread:thread-0 INFO      [c.a.j.index.ha.DefaultNodeReindexService] [INDEX-REPLAY] Node re-index service is not running: currentNode.isClustered=true, notRunningCounter=242748, paused=true, lastPausedStacktrace=java.lang.Throwable
    	at com.atlassian.jira.index.ha.DefaultNodeReindexService.pause(DefaultNodeReindexService.java:215)
    	at com.atlassian.jira.cluster.DefaultClusterManager.requestCurrentIndexFromNode(DefaultClusterManager.java:138)
    	at com.atlassian.jira.cluster.DefaultClusterManager.checkIndex(DefaultClusterManager.java:131)
    	at com.atlassian.jira.startup.ClusteringLauncher.start(ClusteringLauncher.java:37)
    	at com.atlassian.jira.startup.DefaultJiraLauncher.postDBActivated(DefaultJiraLauncher.java:168)
    	at com.atlassian.jira.startup.DefaultJiraLauncher.lambda$postDbLaunch$2(DefaultJiraLauncher.java:146)
    	at com.atlassian.jira.config.database.DatabaseConfigurationManagerImpl.doNowOrEnqueue(DatabaseConfigurationManagerImpl.java:301)
    	at com.atlassian.jira.config.database.DatabaseConfigurationManagerImpl.doNowOrWhenDatabaseActivated(DatabaseConfigurationManagerImpl.java:196)
    	at com.atlassian.jira.startup.DefaultJiraLauncher.postDbLaunch(DefaultJiraLauncher.java:137)
    	at com.atlassian.jira.startup.DefaultJiraLauncher.lambda$start$0(DefaultJiraLauncher.java:104)
    	at com.atlassian.jira.util.devspeed.JiraDevSpeedTimer.run(JiraDevSpeedTimer.java:31)
    	at com.atlassian.jira.startup.DefaultJiraLauncher.start(DefaultJiraLauncher.java:102)
    	at com.atlassian.jira.startup.LauncherContextListener.initSlowStuff(LauncherContextListener.java:154)
    	at com.atlassian.jira.startup.LauncherContextListener.initSlowStuffInBackground(LauncherContextListener.java:139)
    	at com.atlassian.jira.startup.LauncherContextListener.contextInitialized(LauncherContextListener.java:101)
    	at org.apache.catalina.core.StandardContext.listenerStart(StandardContext.java:4689)
    	at org.apache.catalina.core.StandardContext.startInternal(StandardContext.java:5155)
    	at org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:183)
    	at org.apache.catalina.core.ContainerBase$StartChild.call(ContainerBase.java:1412)
    	at org.apache.catalina.core.ContainerBase$StartChild.call(ContainerBase.java:1402)
    	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    	at java.lang.Thread.run(Thread.java:748)


Cause

Jira pauses the cluster index replication service when requesting an index snapshot from another node. If the sending node fails to provide a snapshot for any reason, the cluster index replication service will remain paused indefinitely.


Workaround

You can use one of the following options:

Option 1) Restart again the node

When starting the node, it will send a new request to get a snapshot from another node.


Option 2) Manually copy the index snapshot from another node

  • Sign in on the problematic node
  • In Jira Admin, go to > System > Indexing (under Advanced).
  • At the bottom of the page, choose the source node and copy the index.


Option 3) Restore an index snapshot from a backup

  • If the index backup is enabled, the index snapshots will be at the <yourJirahome>/exports/export/indexsnapshots directory. 
  • Navigate to Administration () > System
  • Select Advanced > Indexing to open the Indexing page
  • Enter the name of the previously saved index in the File name and click Recover.
  • Jira will not be available during the recovery of the index.
  • If changes were made to the configuration that required a re-index after the snapshot was taken, then you will need to do a background re-index after the recovery. Note, Jira will be available after the recovery.

Note:

Background re-index is very slow on recent Jira Versions due to this bug: JRASERVER-72045 - IndexException: Wait attempt timed out - waited 30000 milliseconds caused by background indexing tasks and documented on this Knowledge Base Background reindex is slow after upgrading to Jira 8.10 and later. Thus, if you are aware of multiple changes that could be in place since the previous backup, the ideal is to copy from another node OR do a full reindex, running on this node.



References: 

JRASERVER-72125 - Getting issue details... STATUS

JRASERVER-66970 - Getting issue details... STATUS

JRASERVER-62669 - Getting issue details... STATUS

Last modified on Nov 10, 2022

Was this helpful?

Yes
No
Provide feedback about this article
Powered by Confluence and Scroll Viewport.