Secondary Node in Bamboo Data Center Cluster Fails to Start

Still need help?

The Atlassian Community is here for you.

Ask the community


Platform Notice: Data Center - This article applies to Atlassian products on the Data Center platform.

Note that this knowledge base article was created for the Data Center version of the product. Data Center knowledge base articles for non-Data Center-specific features may also work for Server versions of the product, however they have not been tested. Support for Server* products ended on February 15th 2024. If you are running a Server product, you can visit the Atlassian Server end of support announcement to review your migration options.

*Except Fisheye and Crucible

Summary

After configuring a Bamboo Data Center Cluster on a warm standby, the secondary node fails to start. This article provides troubleshooting steps and solution to resolve the issue and ensure the secondary node comes online successfully.

Environment

This issue has been observed in Bamboo version 10.2.1 but may also applicable to other Bamboo versions that support warm standby, starting from Bamboo 9.5

Diagnosis

The following error can be observed in the Primary and Secondary Node logs located at <bamboo-home>/logs/atlassian-bamboo.log

Error Message in Secondary Node Logs
Failed to notify primary node 34931c93-a335-4262-9c73-0ba6e61c49b0 about current node e39d1e92-a908-4f44-a643-c7a75c4ce576 being alive; this might indicate the nodes' connection issues. Please make sure all nodes are reachable from each other and restart the current Bamboo node. If all other Bamboo nodes are down and you still cannot launch a new node to become a fresh primary, you may try to restart the old primary node or wait up to 300 seconds to make sure the old primary is seen as offline to all other nodes.
Detailed Logs in Secondary Node
2025-03-13 04:28:58,028 WARN [CompletableFutureDelayScheduler] [ClusterNodesCommandsExecutorImpl] Timeout occurred when waiting to get a response from a node 34931c93-a335-4262-9c73-0ba6e61c49b0 after notifying about this node e39d1e92-a908-4f44-a643-c7a75c4ce576 being alive
java.util.concurrent.TimeoutException: null
	at java.base/java.util.concurrent.CompletableFuture$Timeout.run(CompletableFuture.java:2874) [?:?]
	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539) [?:?]
...
2025-03-13 04:28:58,051 INFO [main] [PeerToPeerClient] Peer to peer gRPC client stopped
2025-03-13 04:28:58,052 FATAL [main] [BambooContainer] Cannot start Bamboo
com.atlassian.bamboo.exception.StartupException: Failed to notify primary node 34931c93-a335-4262-9c73-0ba6e61c49b0 about current node e39d1e92-a908-4f44-a643-c7a75c4ce576 being alive; this might indicate the nodes' connection issues. Please make sure all nodes are reachable from each other and restart the current Bamboo node. If all other Bamboo nodes are down and you still cannot launch a new node to become a fresh primary, you may try to restart the old primary node or wait up to 300 seconds to make sure the old primary is seen as offline to all other nodes.
	at com.atlassian.bamboo.cluster.peertopeer.ClusterNodesCommandsExecutorImpl.handlePrimaryNodeNotificationScenarios(ClusterNodesCommandsExecutorImpl.java:89) ~[atlassian-bamboo-core-10.2.1.jar:?]
	at com.atlassian.bamboo.cluster.peertopeer.ClusterNodesCommandsExecutorImpl.ensureNodeVisibilityOrWait(ClusterNodesCommandsExecutorImpl.java:63) ~[atlassian-bamboo-core-10.2.1.jar:?]
	at com.atlassian.bamboo.container.BambooContainer.waitUntilInternalNodesCommunicationIsReady(BambooContainer.java:460) ~[atlassian-bamboo-core-10.2.1.jar:?]
	at com.atlassian.bamboo.container.BambooContainer.start(BambooContainer.java:425) [atlassian-bamboo-core-10.2.1.jar:?]
	at com.atlassian.bamboo.upgrade.UpgradeLauncher.upgradeAndStartBamboo(UpgradeLauncher.java:182) [atlassian-bamboo-web-10.2.1.jar:?]
	at com.atlassian.bamboo.upgrade.UpgradeLauncher.contextInitialized(UpgradeLauncher.java:56) [atlassian-bamboo-web-10.2.1.jar:?]
...
2025-03-13 04:28:58,054 INFO [main] [BambooClusterNodeHeartbeatServiceImpl] Current node [e39d1e92-a908-4f44-a643-c7a75c4ce576] giving up its primary status
2025-03-13 04:28:58,054 INFO [main] [PrimaryNodeServiceImpl] Primary lock permanently lost. Shutting down scheduler responsible for primary lock acquiring.
Primary Node Logs
2025-03-13 04:28:43,936 INFO [nodeHeartbeat.quartz_Worker-1] [BambooClusterNodeHeartbeatServiceImpl] Node e39d1e92-a908-4f44-a643-c7a75c4ce576 became live
2025-03-13 04:28:44,015 INFO [AtlassianEvent::0-BAM::EVENTS:pool-1-thread-46] [TapePerNodeLocalQueue] [LOCALQ] Created persistent replication queue for node: e39d1e92-a908-4f44-a643-c7a75c4ce576 with id: queue_e39d1e92a9084f44a643c7a75c4ce576_0_56853c5abfe409265ba7e5fde67d6d4e in : /home/ubuntu/bamboo-home/localq/queue_e39d1e92a9084f44a643c7a75c4ce576_0_56853c5abfe409265ba7e5fde67d6d4e
2025-03-13 04:28:44,186 INFO [AtlassianEvent::0-BAM::EVENTS:pool-1-thread-46] [PerNodeLocalQueueManager] [LOCALQ] Created cache replication queue: [queueId=queue_e39d1e92a9084f44a643c7a75c4ce576_0_56853c5abfe409265ba7e5fde67d6d4e, queuePath=/home/ubuntu/bamboo-home/localq/queue_e39d1e92a9084f44a643c7a75c4ce576_0_56853c5abfe409265ba7e5fde67d6d4e] with queue dispatcher running: PerNodeLocalQueueDispatcher{queue=com.atlassian.bamboo.cluster.tape.TapePerNodeLocalQueueWithStats@48ea33d3}
2025-03-13 04:29:13,935 INFO [nodeHeartbeat.quartz_Worker-1] [BambooClusterNodeHeartbeatServiceImpl] Node e39d1e92-a908-4f44-a643-c7a75c4ce576 became offline
2025-03-13 04:29:13,935 INFO [AtlassianEvent::0-BAM::EVENTS:pool-1-thread-51] [PerNodeLocalQueueManager] [LOCALQ] Closing cache replication queue: queue_e39d1e92a9084f44a643c7a75c4ce576_0_56853c5abfe409265ba7e5fde67d6d4e

Cause

Starting from Bamboo 9.5, the clustering architecture has been improved to support a warm standby setup. In this setup, Bamboo nodes communicate with each other using gRPC, a high-performance communication protocol. The communication between the nodes happens based on the hostname and port that are configured in each node's cluster-node.properties configuration file.

If there are issues preventing Bamboo nodes from connecting, such as network problems, firewall restrictions, or incorrect configuration settings, it can prevent the secondary node from starting correctly. This can cause the secondary node to fail when trying to join the cluster.

Solution

  1. Identify the hostname and port configured in the <bamboo-home>/cluster-node.properties configuration file on both the primary and secondary nodes. The default port is 9090, but it may have been modified based on your specific requirements. Please check the property node.internal.communication.port to identify which port is listening on.
  2. Validate if the Bamboo nodes are reachable over the port identified in Step 1 from one another. This can be tested using below commands:
    //From Primary Node:
    telnet <secondary_node_hostname> <secondary_node_grpc_port>
    
    //From Secondary Node:
    telnet <primary_node_hostname> <primary_node_grpc_port>
  3. If the nodes are not reachable, ensure that both Bamboo nodes can communicate over the port identified in Step 1. This can be achieved by configuring the necessary network and firewall settings between the nodes. Make sure there are no restrictions or blockages preventing communication on port identified in Step 1( default 9090), such as firewall rules or network misconfigurations.
  4. Once both nodes are successfully reachable over this port, the secondary node should be able to start and join the cluster without any issues.
  5. If the secondary node still fails to start even after the ports are reachable, please contact the Support Team and provide the log file located at <bamboo-home>/logs/atlassian-bamboo.log.

Note

The first node to be started becomes the primary node, while all subsequent nodes become secondary nodes. If there are more than two nodes, the steps mentioned above should be repeated for each secondary node.


Last modified on Mar 17, 2025

Was this helpful?

Yes
No
Provide feedback about this article
Powered by Confluence and Scroll Viewport.