Cluster Index Replication health check fails in Jira Data Center due to Jira Charting Plugin
Platform Notice: Data Center - This article applies to Atlassian products on the Data Center platform.
Note that this knowledge base article was created for the Data Center version of the product. Data Center knowledge base articles for non-Data Center-specific features may also work for Server versions of the product, however they have not been tested. Support for Server* products ended on February 15th 2024. If you are running a Server product, you can visit the Atlassian Server end of support announcement to review your migration options.
*Except Fisheye and Crucible
Problem
Jira Data Center throws a warning regarding the Cluster Index Replication health check failing. Different node's indexes may fall out of sync with each other, resulting in inconsistent Issue Navigator search results, gadget results, and other issue-related symptoms.
The Cluster Index Replication health check may report delays like the following:
Name: Cluster Index Replication
Is healthy: false
Failure reason: ["Index replication for cluster node 'node2' is behind by 26,067 seconds.","Index replication for cluster node 'node3' is behind by 33,220 seconds.","Index replication for cluster node 'node4' is behind by 9,658 seconds."]
Severity: WARNING
Diagnosis
Environment
Jira is configured as a multi-node Data Center
The Jira Charting Plugin - Server is installed
Diagnostic Steps
Capture thread dumps from affected nodes (for example: Troubleshooting Performance Issues with thread dumps)
Verify whether the NodeReindexServiceThread thread shows a stack trace similar to the following:
"NodeReindexServiceThread:thread-1" prio=5 tid=0x000000000000015f nid=0 runnable java.lang.Thread.State: RUNNABLE at java.net.SocketInputStream.socketRead0(Native Method) at java.net.SocketInputStream.socketRead(SocketInputStream.java:116) at java.net.SocketInputStream.read(SocketInputStream.java:171) at java.net.SocketInputStream.read(SocketInputStream.java:141) ... at com.sun.proxy.$Proxy390.updateValues(Unknown Source) at com.atlassian.jira.ext.charting.field.TimeInStatusCFType.storeDatabaseValue(TimeInStatusCFType.java:98) at com.atlassian.jira.ext.charting.field.TimeInStatusCFType.getValueFromIssue(TimeInStatusCFType.java:77) at com.atlassian.jira.issue.fields.ImmutableCustomField.getValue(ImmutableCustomField.java:350) ... at com.sun.proxy.$Proxy41.reIndexIssueObjects(Unknown Source) at com.atlassian.jira.index.ha.DefaultNodeReindexService.updateIssueIndex(DefaultNodeReindexService.java:453) at com.atlassian.jira.index.ha.DefaultNodeReindexService.updateAffectedIndexes(DefaultNodeReindexService.java:341) at com.atlassian.jira.index.ha.DefaultNodeReindexService.applyIndexOperations(DefaultNodeReindexService.java:279) at com.atlassian.jira.index.ha.DefaultNodeReindexService.reIndex(DefaultNodeReindexService.java:265) at com.atlassian.jira.index.ha.DefaultNodeReindexService$$Lambda$352/669392084.run(Unknown Source) ... at java.lang.Thread.run(Thread.java:748)
- The key section of the stack trace is the existence of a method call containing "com.atlassian.jira.ext.charting".
Cause
The Jira Charting Plugin - Server is an experimental plugin developed by Atlassian which is no longer maintained nor supported. It is also not classified as Data Center Compatible and is not recommended for any production environment. (see JCHART-479 - Jira Charting Plugin is unsafe for use in Data Center environments as it may cause a deadlock in the database when multiple nodes attempt perform CF updates)
The plugin is unsafe for use in Data Center environments as it may cause a deadlock in the database when multiple nodes attempt to perform the same functionality at the same time. The issue manifests when the following scenario occurs:
- An issue operation is performed to an issue on Node A
- Node A replicates the index operation to Node B and Node C
- Node B and Node C attempts to reindex the issue simultaneously
Reindexing an issue when the Jira Charting Plugin is installed will reindex the Time in Status Custom Field, causing it to recalculate, so that the recalculated value may be written into the node's index. This recalculation begins with a deletion to the custom field's value in the database.
- The same delete statement is made to the database's customfieldvalue table from multiple nodes, resulting in a deadlock
There is a newer Data Center specific version of the Jira Charting Plugin - Data Center, that you can install after following the workaround to unclog the index replication.
Workaround
Resolving the immediate Index Replication delays will involve removing the database deadlock, such that the NodeReindexServiceThread is able to proceed with replicating the rest of the index operations.
Consult your Database Administrators for assistance in identifying the deadlocked queries and terminating them. Typically, these appear to be long-running delete queries against the customfieldvalue table.
Shut down all but one node in the cluster, as the queries will be released when the owner node is shut down. The nodes may then be brought back up.
Resolution
Follow any of the Workaround steps to unclog the index replication, and then permanently remove the Jira Charting Plugin - Server. After this you can install the appropriate version of the Jira Charting Plugin - Data Center.