Bamboo agent go offline with message "Quit and let the Wrapper resynch"
Platform Notice: Server and Data Center Only - This article only applies to Atlassian products on the server and data center platforms.
A Bamboo remote agent will go offline unexpectedly showing in the logs the message "Quit and let the Wrapper resynch"
Observed (although not exclusively) in Windows hosted agents
Bamboo agent will go offline and then restart automatically. In the logs we will observe something similar to:
INFO | jvm 86 | 2020/12/03 06:50:51 | Read Timed out. (Last Ping was XXXXX milliseconds ago) INFO | wrapper | 2020/12/03 06:50:51 | Wrapper Process has not received any CPU time for XX seconds. Extending timeouts. ... INFO | jvm 86 | 2020/12/03 06:52:07 | Read Timed out. (Last Ping was XXXXX milliseconds ago) INFO | jvm 86 | 2020/12/03 06:52:07 | Wrapper Manager: The Wrapper code did not ping the JVM for XX seconds. Quit and let the Wrapper resynch. INFO | jvm 86 | 2020/12/03 06:52:07 | Send a packet RESTART : restart
The time out messages in the logs mean that neither the wrapper now the JVM have access to the CPU for the seconds mentioned in the log entry. The first way that this can happen is when the Wrapper is competing for system resources with another process that has the habit of consuming 100% of the CPU for extended periods of time without yielding to other processes. Most modern operating systems are fairly good about managing multitasking. But there are still cases where it can fail. One example of this on Windows is when the machine is very low on memory, leading to lots of disk swapping. If the total memory is not large enough, the entire system can freeze up for as long as a minute before any applications are again given any CPU cycles.
It's worth mentioning that this is not a Bamboo problem but a hardware limitation based on the usage given. Addressing this is our official recommendation.
The wrapper also offers a way to extend the time out value.
This property is ignored unless wrapper.use_system_time=TRUE.
The property to set the time out is
This property sets the number of seconds without CPU before the JVM will issue a warning and extend timeouts. In order for this property to have any effect, it must have a value less than the following properties:
wrapper.startup.timeout wrapper.ping.timeout wrapper.shutdown.timeout
The default value for wrapper.cpu.timeout is 10 (seconds). Setting this property value to "0" (zero) means never extend time out.
While the ability is there, be aware that setting this property value to "0" (zero) (= disable Timeout) or a larger value than another timeout could cause that timeout to be falsely triggered in cases of heavy load. This can lead to the JVM being restarted when a restart is not really necessary.