Cluster cache replication to a node fails with "Retry replication to node <node_id> failed, node still unreachable"
Platform notice: Server and Data Center only. This article only applies to Atlassian products on the Server and Data Center platforms.
Support for Server* products ended on February 15th 2024. If you are running a Server product, you can visit the Atlassian Server end of support announcement to review your migration options.
*Except Fisheye and Crucible
Problem
One or many nodes in Jira Data Center (source) fails/fails to replicate its/their cache to a given node (target).
The following appears in atlassian-jira.log:
[c.a.jira.cluster.CuttingOffExecutorImpl] Retry replication to node QYVRH73JIRAP03 failed, node still unreachable. This was 8 attempt. Backing off for 300000 milliseconds.
Diagnosis
Review the content of the
clusternode
table:SELECT * FROM clusternode;
This table is the reference used by Jira when it comes to Data Center operations.
- Get the ip column value for the new node (node 3), this column may point to an actual IP or a hostname (DNS).
- Get the cache_listener_port value for the new node (node 3). By default, this is 40001.
From the source node(s) failing to replicate its/their cache:
lookup the IP or hostname for the target node:
nslookup <target_node_ip_or_hostname>
Test the connectivity to the target on the cache port:
telnet <target_node_ip_or_hostname> <target_node_cache_listener_port>
Cause
This issue is generally caused by a network or DNS misconfiguration.
Resolution
Involve your network team to allow connectivity between the Data Center nodes.