Bamboo Data Center NodeAliveWatchdog shuts down Bamboo during DB scheduled backups

robotsnoindex

Platform Notice: Data Center - This article applies to Atlassian products on the Data Center platform.

Note that this knowledge base article was created for the Data Center version of the product. Data Center knowledge base articles for non-Data Center-specific features may also work for Server versions of the product, however they have not been tested. Support for Server* products ended on February 15th 2024. If you are running a Server product, you can visit the Atlassian Server end of support announcement to review your migration options.

*Except Fisheye and Crucible

Summary

Bamboo Data Center shuts down with a message in <bamboo-home>/logs/atlassian-bamboo.log stating it could not refresh the state in the DB.

Environment

Bamboo Data Center 8.0 and later.

Diagnosis

The <bamboo-home>/logs/atlassian-bamboo.log file contains a message similar to:

2023-03-23 06:17:46,556 ERROR [scheduler_Worker-6] [NodeAliveWatchdog] Current node failed to refresh its state in DB within last 3 minutes. This node will now go down

Cause

The Bamboo NodeAliveWatchdog monitors the database for read and write ability. If the Database is unavailable or read-only for more than 3 minutes, the node will shut down to allow the cold standby node, if one is available, to take over.

Solution

Prior to Bamboo 9.5

If your database is anticipated to be unavailable for more than 3 minutes you can increase or disable the NodeAliveWatchdog timeout by adding a Bamboo System Property. For example, the snippet below will set the timeout to 5 minutes.

-Dbamboo.node.alive.watchdog.timeout=5

Setting the property value of 0 disables the check, that should stop it from shutting down during periods where it cannot get database connections but it's not a recommended approach as we're just masking/working around a potentially serious underlying issue.A number greater than 0 will be the number of minutes.

Bamboo 9.5 and later

We can disable the health-check that is causing the instance to shutdown, as well as increase the node lock and cluster heartbeat timeout value  with the below property:

-Dbamboo.node.alive.watchdog.enabled=false -Dbamboo.primary.node.lock.timeout.seconds=600 -Dbamboo.cluster.heartbeat.alive.timeout.seconds=600

This will prevent the nodes and the cluster to remain active till 10 minutes post which it will shutdown if the DB is still unavailable. You can set the timeout value to a higher number if you foresee the DB to be down for a long time.That should stop it from shutting down during periods where it cannot get database connections but it's not a recommended approach as we're just masking/working around a potentially serious underlying issue.

-Dbamboo.node.alive.watchdog.enabled :- is the one when enabled monitors the  database for read and write ability, checks whether the DB is unavailable or readonly.

-Dbamboo.cluster.heartbeat.alive.timeout.seconds :- is the duration (in seconds) after which a node is considered dead if no heartbeat is received. Default 300 seconds.

-Dbamboo.primary.node.lock.timeout.seconds :- is the one that specify how long the secondary nodes waits until they take over the primary role. Default 120 seconds. It is not recommended to have a high value of this parameter  in the warm standby setup as it prevents secondary nodes from taking over




Last modified on Mar 7, 2025

Was this helpful?

Yes
No
Provide feedback about this article
Powered by Confluence and Scroll Viewport.