Build resiliency in Bamboo Data Center

BAMBOO 8.0 EARLY ACCESS PROGRAM

In Bamboo versions earlier than 8.0, when the server’s work got interrupted or if a server went down for more than 5 minutes, Bamboo builds would fail due to lack of connection of the building agent with the server. Bamboo agents were designed to die when they couldn't connect to a server for longer than 5 minutes.

With Bamboo Data Center, the agent will continue its work and finish building even if the connection with the server is lost. Once the agent’s building work is done, it tries to connect to the server. If the server is already online, the agent will send build results, logs, and artifacts to the server, and pick up the next tasks from the server. If the server is still down, the agent will try to reconnect with the server after some time.

How many times does the agent try to reconnect?

If agent is started with the agent wrapper, by default, the agent tries to connect to the server 1440 times, or until it’s successful. You can change this value by going to $BAMBOO_AGENT_HOME/conf/wrapper.conf and modifying the wrapper.max_failed_invocations value.

If agent isn't started with the agent wrapper, it will try to transmit results 10 times with 5-minute intervals, and then terminate if not successful. However, if manually restarted, the agent will go into the ‘retry’ loop again provided the result is not removed from the disk.

If the transmission problems are caused by the network failure, the effective timeout is considerably shorter as in such case the server recognizes that the agent is offline and terminates the build on its end. This behavior is configured by heartbeat timeouts. For more information, see Changing the remote agent heartbeat interval.


It is important to understand that guaranteed recovery will work only if the build process is able to finish its work. Bamboo will not be able to finish the build if:

  • a child process is failing or stopped

  • an agent's process is stopped while the build is running

  • a resource required for build process is unavailable (this includes resources provided by the Bamboo server, like REST endpoints and artifacts from other builds)

  • a build is failing because of intermittent infrastructure problems


Elastic agents

The same logic applies to agents started at EC2 environment. To achieve it, Bamboo agent is started using the Tanuki wrapper, which is also used by the remote agent. The wrapper allows to restart Bamboo agent when Java process is interrupted by connection timeout error.

If you’re using elastic images provided by Bamboo 8.0 (or based on them), elastic agents use the agent wrapper and can fully benefit from improved build resiliency. Old images are still functional but will work with the ‘short’ timeout only.

After server restart, elastic agents that use the agent wrapper are able to fully resume their operation. Agents without wrapper are allowed to return the result they worked on but then they will terminate.

Disabling elastic tunnel is no longer prerequisite for seamless restarts/improved build resiliency.


Last modified on Jul 19, 2021

Was this helpful?

Yes
No
Provide feedback about this article
Powered by Confluence and Scroll Viewport.