Bamboo Data Center with Cold Standby allows multiple nodes to fully start

robotsnoindex


Platform Notice: Data Center - This article applies to Atlassian products on the Data Center platform.

Note that this knowledge base article was created for the Data Center version of the product. Data Center knowledge base articles for non-Data Center-specific features may also work for Server versions of the product, however they have not been tested. Support for Server* products ended on February 15th 2024. If you are running a Server product, you can visit the Atlassian Server end of support announcement to review your migration options.

*Except Fisheye and Crucible

Summary

When setting up Bamboo Data Center in a cluster with cold standby nodes the expectation is to have one node running Bamboo Data Center and the other node(s) pause startup. If the cluster-node.properties file is replicated from the primary server to the cold standby nodes when the bamboo.cfg.xml file is replicated the nodes will share an ID and the node will believe it has an exclusive cluster lock.

Environment

Bamboo Data Center 8.0, 8.1, 8.2, 9.0, 9.1, 9.2, or 9.3 with Cold Standby node(s)

Diagnosis

With the intended primary node running Bamboo DC, start the Bamboo DC process on the cold standby node. Monitor the application logs (<bamboo-home>/logs/atlassian-bamboo.log), the expected behavior is for the startup to suspend until the primary lock is released or the primary node stops updating the lock:


2023-06-08 22:04:45,912 INFO [main] [QuartzScheduler] JobFactory set to: com.atlassian.scheduler.quartz2.Quartz2JobFactory@5da8d90f
2023-06-08 22:04:45,913 INFO [main] [QuartzScheduler] Scheduler nodeHeartbeat.quartz_$_NON_CLUSTERED started.
2023-06-08 22:04:45,913 INFO [main] [QuartzScheduler] JobFactory set to: com.atlassian.scheduler.quartz2.Quartz2JobFactory@40f7e6ae
2023-06-08 22:04:45,913 INFO [main] [QuartzScheduler] Scheduler nodeHeartbeat.quartz_$_NON_CLUSTERED started.
2023-06-08 22:04:45,962 INFO [main] [ClusterLockBootstrapServiceImpl] Primary lock is held by another instance, suspending....
2023-06-08 22:05:45,965 INFO [main] [ClusterLockBootstrapServiceImpl] Primary lock is held by another instance, suspending....
2023-06-08 22:06:45,967 INFO [main] [ClusterLockBootstrapServiceImpl] Primary lock is held by another instance, suspending....


Instead, both nodes print log messages gaining the primary lock:

2023-06-08 22:07:39,002 INFO [main] [QuartzScheduler] JobFactory set to: com.atlassian.scheduler.quartz2.Quartz2JobFactory@43561951                                                                                                                                                                                                                           
2023-06-08 22:07:39,003 INFO [main] [QuartzScheduler] Scheduler nodeHeartbeat.quartz_$_NON_CLUSTERED started.                                                                                                                                                                                                                                                 
2023-06-08 22:07:39,003 INFO [main] [QuartzScheduler] JobFactory set to: com.atlassian.scheduler.quartz2.Quartz2JobFactory@23337367                                                                                                                                                                                                                           
2023-06-08 22:07:39,003 INFO [main] [QuartzScheduler] Scheduler nodeHeartbeat.quartz_$_NON_CLUSTERED started.                                                                                                                                                                                                                                                 
2023-06-08 22:07:39,045 INFO [main] [ClusterLockBootstrapServiceImpl] Primary lock acquired with node id 7272e04b-f5d2-4867-9ab2-20beef4a47d9, proceeding with startup... 

Both nodes fully startup and present a web UI. This state is dangerous for the integrity of Bamboo and the cold standby node(s) should be shut down immediately. 


Cause

The <bamboo-home>/cluster-node.properties file has been replicated from the primary node to the cold standby node. Since the cluster node IDs are identical Bamboo is unaware two nodes are running and could enter an inconsistent state.

Solution

Delete the cluster-node.properties file on all cold standby nodes. The next time the Bamboo process is started the cluster-node.properties file will be regenerated with a new UUID.


Last modified on Mar 7, 2025

Was this helpful?

Yes
No
Provide feedback about this article
Powered by Confluence and Scroll Viewport.