Cluster cache replication to a node fails with "Retry replication to node <node_id> failed, node still unreachable"

Still need help?

The Atlassian Community is here for you.

Ask the community

Platform notice: Server and Data Center only. This article only applies to Atlassian products on the Server and Data Center platforms.

Support for Server* products ended on February 15th 2024. If you are running a Server product, you can visit the Atlassian Server end of support announcement to review your migration options.

*Except Fisheye and Crucible


Problem

One or many nodes in Jira Data Center (source) fails/fails to replicate its/their cache to a given node (target). 

The following appears in atlassian-jira.log:

[c.a.jira.cluster.CuttingOffExecutorImpl] Retry replication to node QYVRH73JIRAP03 failed, node still unreachable. This was 8 attempt. Backing off for 300000 milliseconds.

Diagnosis

  1. Review the content of the clusternode table: 

    SELECT * FROM clusternode;

    This table is the reference used by Jira when it comes to Data Center operations. 

  2. Get the ip column value for the new node (node 3), this column may point to an actual IP or a hostname (DNS).
  3. Get the cache_listener_port value for the new node (node 3). By default, this is 40001.
  4. From the source node(s) failing to replicate its/their cache:

    1. lookup the IP or hostname for the target node:

      nslookup <target_node_ip_or_hostname>
    2. Test the connectivity to the target on the cache port:

      telnet <target_node_ip_or_hostname> <target_node_cache_listener_port>

Cause

This issue is generally caused by a network or DNS misconfiguration.

Resolution

Involve your network team to allow connectivity between the Data Center nodes. 


Last modified on Oct 12, 2018

Was this helpful?

Yes
No
Provide feedback about this article
Powered by Confluence and Scroll Viewport.