Last node to start takes over the system and answer all the requests
Platform Notice: Data Center - This article applies to Atlassian products on the Data Center platform.
Note that this knowledge base article was created for the Data Center version of the product. Data Center knowledge base articles for non-Data Center-specific features may also work for Server versions of the product, however they have not been tested. Support for Server* products ended on February 15th 2024. If you are running a Server product, you can visit the Atlassian Server end of support announcement to review your migration options.
*Except Fisheye and Crucible
Summary
The second node to start always take over from the first one. Only the latest node delivers requests.
Start node A and everything is fine. Then start node B, after the start, node B takes over the cluster, and node A is out of the cluster.
The same happens if you start node B first, after the start of the node A, the node A takes over the system.
Environment
Confluence Data Center using a clustered setup.
Diagnosis
- Start node A
- Go to the Cog icon → General Configuration → Clustering. The node A will appear there as the only node on the cluster
- Start node B
- After something, check the Clustering page again and only node B is there, but node A is still running fine at OS level
The logs will not show any error, but checking both nodes it's possible to see this behaviour:
Start-up example first node
This is the start-up on the node A, it's possible to see LINE 3, the both nodes IP are in place and then start from node A is just fine:
2020-08-20 14:25:36,891 INFO [Catalina-utility-1] [com.atlassian.confluence.lifecycle] contextInitialized Starting Confluence XXX [build XXXX based on commit hash ] - synchrony version XXXXXXX
2020-08-20 14:25:40,710 INFO [Catalina-utility-1] [atlassian.confluence.cluster.DefaultClusterConfigurationHelper] lambda$populateExistingClusterSetupConfig$5 Populating setup configuration if running with Cluster mode...
2020-08-20 14:25:41,002 INFO [Catalina-utility-1] [confluence.cluster.hazelcast.HazelcastClusterManager] configure Configuring Hazelcast with instanceName [confluence], join configuration TCP/IP member addresses: NODE_A_IP|NODE_B_IP, network interfaces [NODE_A_IP] and local port XXXXXX
2020-08-20 14:25:41,002 INFO [Catalina-utility-1] [confluence.cluster.hazelcast.HazelcastClusterManager] startCluster Starting the cluster.
2020-08-20 14:27:52,859 INFO [Catalina-utility-1] [confluence.cluster.hazelcast.HazelcastClusterManager] startCluster Confluence cluster node identifier is [XXXXXXXXX]
2020-08-20 14:27:52,860 INFO [Catalina-utility-1] [confluence.cluster.hazelcast.HazelcastClusterManager] startCluster Confluence cluster node name is [NODE_A_NAME]
2020-08-20 14:27:52,933 INFO [Catalina-utility-1] [springframework.web.context.ContextLoader] initWebApplicationContext Root WebApplicationContext: initialization started
2020-08-20 14:27:56,713 INFO [Catalina-utility-1] [com.atlassian.confluence.lifecycle] <init> Loading EhCache cache manager
2020-08-20 14:28:04,860 INFO [Catalina-utility-1] [cluster.hazelcast.monitoring.HazelcastMembershipListener] init init: cluster ClusterService{address=[NODE_A_IP]:XXXX}
2020-08-20 14:28:04,862 INFO [Catalina-utility-1] [cluster.hazelcast.monitoring.HazelcastMembershipListener] init init: cluster contains Member [NODE_A_IP]:XXXXX - XXXXXXXXXXXXXXXXXXXXXXX this
The last line, from the log above, shows the node A as the only node in the cluster, which is the correct state at this moment.
Start-up example second node
Then, once you start node B, it's possible to see a similar behaviour:
2020-08-20 14:32:43,165 INFO [Catalina-utility-1] [com.atlassian.confluence.lifecycle] contextInitialized Starting Confluence XXXX [build XXXX based on commit hash XXXXXX] - synchrony version XXXXXXX
2020-08-20 14:32:46,310 INFO [Catalina-utility-1] [atlassian.confluence.cluster.DefaultClusterConfigurationHelper] lambda$populateExistingClusterSetupConfig$5 Populating setup configuration if running with Cluster mode...
2020-08-20 14:32:46,666 INFO [Catalina-utility-1] [confluence.cluster.hazelcast.HazelcastClusterManager] configure Configuring Hazelcast with instanceName [confluence], join configuration TCP/IP member addresses: NODE_A_IP|NODE_B_IP, network interfaces [10.31.199.79] and local port 5801
2020-08-20 14:32:46,667 INFO [Catalina-utility-1] [confluence.cluster.hazelcast.HazelcastClusterManager] startCluster Starting the cluster.
2020-08-20 14:34:58,581 INFO [Catalina-utility-1] [confluence.cluster.hazelcast.HazelcastClusterManager] startCluster Confluence cluster node identifier is [XXXXXXXX]
2020-08-20 14:34:58,581 INFO [Catalina-utility-1] [confluence.cluster.hazelcast.HazelcastClusterManager] startCluster Confluence cluster node name is [NODE_B_NAME]
2020-08-20 14:34:58,673 INFO [Catalina-utility-1] [springframework.web.context.ContextLoader] initWebApplicationContext Root WebApplicationContext: initialization started
2020-08-20 14:35:02,293 INFO [Catalina-utility-1] [com.atlassian.confluence.lifecycle] <init> Loading EhCache cache manager
2020-08-20 14:35:09,931 INFO [Catalina-utility-1] [cluster.hazelcast.monitoring.HazelcastMembershipListener] init init: cluster ClusterService{address=[NODE_B_IP]:XXXX}
2020-08-20 14:35:09,933 INFO [Catalina-utility-1] [cluster.hazelcast.monitoring.HazelcastMembershipListener] init init: cluster contains Member [NODE_B_IP]:XXXXX - XXXXXXXXXXXXXXXXXXXXXXX this
The important point on these logs is the last line. It's possible to see the node B joined the cluster, but node A is not there anymore. However, there are no problems on the node A side.
Cause
This issue will happen when the communication between the nodes isn't working. The main cause for this will be the Hazelcast communication port being blocked in one or more of the nodes.
Solution
Ensure that all the hosts involved on the Cluster are able to communicate with each other using the port 5701 (default Hazelcast internal communication port).