Cluster Cache replication failure due to Unresolved Host

Still need help?

The Atlassian Community is here for you.

Ask the community

Summary


JIRA Data Center instance health check shows "Cluster Cache Replication" health check failure. This failure suggests that nodes in the Cluster are not able to communicate.

Name: Cluster Cache Replication
NodeId: null
Is healthy: false
Failure reason: The node XXXXX is not replicating
Severity: CRITICALAdditional links: []


Environment

Data Center instances having more than one node in the cluster.

Diagnosis

  • Review atlassian-jira.log for the affected node showing “Cluster Cache Replication” health check warning. Following traces are noticed in the logs:
2023-06-01 11:58:15,055+0200 main WARN      [c.a.jira.util.JiraUtils] IP/Hostname address cannot be calculated for this host. Please fix this.
.
.
2023-06-01 11:58:15,180+0200 main ERROR      [n.sf.ehcache.Cache] Unable to set localhost. This prevents creation of a GUID. Cause was: XXXXX: XXXXX: Name or service not known
java.net.UnknownHostException: XXXXX: XXXXX: Name or service not known
.
.
.
2023-06-01 11:58:15,971+0200 main WARN      [n.sf.ehcache.CacheManager] Cache com.atlassian.jira.task.TaskManagerImpl.taskMaprequested bootstrap but a CacheException occured. Error bootstrapping from remote peer. Message was: java.lang.reflect.InvocationTargetException
net.sf.ehcache.distribution.RemoteCacheException: Error bootstrapping from remote peer. Message was: java.lang.reflect.InvocationTargetException
    at net.sf.ehcache.distribution.RMIBootstrapCacheLoader.doLoad(RMIBootstrapCacheLoader.java:176)
.
.
Caused by: java.rmi.ConnectException: Connection refused to host: 127.0.1.1; nested exception is: 
    java.net.ConnectException: Connection refused (Connection refused)
    at java.rmi/sun.rmi.transport.tcp.TCPEndpoint.newSocket(Unknown Source)
    at java.rmi/sun.rmi.transport.tcp.TCPChannel.createConnection(Unknown Source)
    at java.rmi/sun.rmi.transport.tcp.TCPChannel.newConnection(Unknown Source)
    at java.rmi/sun.rmi.server.UnicastRef.invoke(Unknown Source)
    at java.rmi/java.rmi.server.RemoteObjectInvocationHandler.invoke(Unknown Source)
    at com.sun.proxy.$Proxy40.getKeys(Unknown Source)
    ... 64 more
Caused by: java.net.ConnectException: Connection refused (Connection refused)
  • Verify /etc/hosts entries on the affected node to confirm if there exists an entry like below:
127.0.1.1 XXXXX
  • There is no entry in the /etc/hosts mapping the Node IP address with the jira.node.id value configured in the node’s cluster.properties file

If there are traces in the logs like below. You may refer Cluster Cache replication health check fails with error SocketException: Broken pipe exception

2021-06-04 18:26:25,830+0000 localq-reader-12 ERROR      [c.a.j.c.distribution.localq.LocalQCacheOpReader] [LOCALQ] [VIA-COPY] Abandoning sending: LocalQCacheOp{cacheName='com.atlassian.jira.plugins.healthcheck.service.HeartBeatService.heartbeat', action=PUT, key=node2, value == null ? false, replicatePutsViaCopy=true, creationTimeInMillis=1622831185825} from cache replication queue: [queueId=queue_node1_2_164546f60261c7e4be0c5f5f9aaeec86_put, queuePath=/var/atlassian/application-data/jira-home/localq/queue_node1_2_164546f60261c7e4be0c5f5f9aaeec86_put], failuresCount: 1/1. Removing from queue. Error: java.rmi.MarshalException: error marshalling arguments; nested exception is:
        java.net.SocketException: Broken pipe (Write failed)

Cause

These Errors suggest that there is some misconfiguration with the /etc/hosts entries. The node XXXXX is pointing to 127.0.1.1 but this IP address is not resolving to the node itself and hence, Connection refused Error.

Solution

  • Please comment out (add a '#' in front of lines) below entries in the  /etc/hosts file.
#127.0.1.1 XXXXX
  • Update the /etc/hosts to map the Node IP Address to the jira.node.id configured in the cluster.properties.
  • Once these changes are done, please check again.

(warning) Please note, above changes requires a complete application node restart for the changes to take effect.



Last modified on Sep 11, 2024

Was this helpful?

Yes
No
Provide feedback about this article
Powered by Confluence and Scroll Viewport.