Cluster Cache Replication health check fails in Jira server

Still need help?

The Atlassian Community is here for you.

Ask the community

Platform Notice: Data Center Only - This article only applies to Atlassian products on the data center platform.

Overview

JIRA Data Center cluster replication relies on nodes being recorded in a database and also receiving and sending updates. The Health Check confirms that the replication is working in the entire cluster. If an active node is not responding, the other nodes are going to be reporting warnings and the one with the error will report a critical result.


Understanding the Results

IconResultWhat this means
The health check passed successfully.Node replication within the cluster is working.
The node <node> is not in the database.

The node does not appear to be in the database however does exist within the replication cache, or the node is unresponsive.

The node <node> is not replicating.The node is not replicating information to the cluster - it exists in the database but not the replication cache.

Inconsistent state across the cluster

Jira keeps some data in memory local to the node especially data that are used often such as permissions. The cache synchronization is asynchronous (7.9 and later) but expected to be fast and consistent across nodes. It is communicated and being replicated over the network.

Symptoms:

  • Users exist on some nodes but not all.
  • Users may have permissions on some nodes but not all.
  • User field dropdown showing results on some nodes but not all. 
  • Filter and gadgets show up in one node but not on others after permission update.

Troubleshooting

ProblemSuggestion
The node is not in the database.

Restart the affected node. Prior to doing so, it is recommended to collect some thread dumps as per Generating a Thread Dump as these can be sent to support with the data below.

The node is not replicating due to a network condition.
  • Each server needs to be able to resolve its own host name correctly. Specify the hostname value for the ehcache.listener.hostName parameter in the cluster.properties, following the instructions in the Install Guide.
  • There may be a firewall or network condition blocking communication between the nodes. Check the logs on each of the nodes for exceptions relating to network connectivity.
  • JIRA Data Center can be configured to use TCP multicast to communicate between the nodes. Check the configuration in the cluster.properties file and review the Install Guide more information.
  • If you are using multicast, to check the current multicast address in use, run the following command on each of the nodes: 
    • LINUX: netstat -g
    • WINDOWS: netsh interface ip show joins
  • It is not expected that a firewall would be used between JIRA Data Center nodes.

A request JRA-43380 - Getting issue details... STATUS has been raised to have these configuration options documented.

The node is not replicating due to nodes being offline.Check the status of each of the other nodes, specifically if they are online and responsive.

You can monitor cache replication by reviewing statistics that are written in the log file. They’ll show you the size of the local queues, and whether cache modifications are successfully replicated or persisted in the queues for too long. In most cases, monitoring just a few parameters will tell you if the replication is working properly.For more info, see Monitoring the cache replication.

Providing Information to Support

In case you are unable to troubleshoot and fix the problem by yourself, please create a support ticket at support.atlassian.com and attach the following information to the ticket:

  • Take a Screenshot of the Health Check results.
  • Collect a Support ZIP from each of the Data Centre nodes.
  • Any collected information from the suggestions in this document.


DescriptionJIRA Data Center cluster replication relies on nodes being recorded in a database and also receiving and sending updates. The Health Check confirms that the replication is working in the entire cluster. If an active node is not responding, the other nodes are going to be reporting warnings and the one with the error will report a critical result.
Last modified on Aug 4, 2021

Was this helpful?

Yes
No
Provide feedback about this article
Powered by Confluence and Scroll Viewport.