Troubleshooting Bitbucket service failure due to JFR logging.
Platform Notice: Data Center Only - This article only applies to Atlassian products on the Data Center platform.
Note that this KB was created for the Data Center version of the product. Data Center KBs for non-Data-Center-specific features may also work for Server versions of the product, however they have not been tested. Support for Server* products ended on February 15th 2024. If you are running a Server product, you can visit the Atlassian Server end of support announcement to review your migration options.
*Except Fisheye and Crucible
Summary
Bitbucket server experiences periodic downtimes, which don't have specific pointers in the logs, other than the Nodes being removed from the cluster.
2024-04-23 13:33:57,802 INFO [hz.hazelcast.event-5] c.a.s.i.c.HazelcastClusterService Node '/10.x.x.x:5701' was REMOVED from the cluster. Updated cluster:
2024-04-23 12:30:53,558 INFO [hz.hazelcast.event-5] c.a.s.i.c.HazelcastClusterService Node '/10.x.x.x:5701' was REMOVED from the cluster. Updated cluster:
2024-04-23 14:00:36,387 INFO [hz.hazelcast.event-4] c.a.s.i.c.HazelcastClusterService Node '/10.x.x.x:5701' was REMOVED from the cluster. Updated cluster:
Environment
- Bitbucket 8.14.1 and above.
Diagnosis
While diagnosing this issue we have found:
- No evidence of the service being killed externally by 'OOM' killer for example
- No evidence of the service being gracefully shut down
- The atlassian-bitbucket.log in affected nodes only shows them exiting the cluster without further logging.
2024-04-22 15:08:41,425 WARN [hz.hazelcast.cached.thread-1] c.h.n.t.TcpIpConnectionErrorHandler [10.x.x.x]:5701 [GTE-bitbucket-cluster] [3.12.13] Removing connection to endpoint [10.x.x.x]:5701 Cause => java.net.SocketException {Connection refused to address /10.x.x.x:5701}, Error-Count: 52024-04-22 15:08:41,433 INFO [hz.hazelcast.event-4] c.a.s.i.c.HazelcastClusterService Node '/10.X.X.X' was REMOVED from the cluster. Updated cluster:
- The launcher.log which logs the system
stdout
/stderr
events, however, details the events leading to the JVM shutting down.
00:43:18.352 [main] INFO com.atlassian.security.java8.serialfilter.DeserializationFilterConfigurator - Global serial filter set to JDK 8 DeserializationFilter
ANTLR Tool version 4.5.3 used for code generation does not match the current runtime version 4.6ANTLR Runtime version 4.5.3 used for parser compilation does not match the current runtime version 4.6ANTLR Tool version 4.5.3 used for code generation does not match the current runtime version 4.6ANTLR Runtime version 4.5.3 used for parser compilation does not match the current runtime version 4.62024-02-08 15:26:05,572 analyticsEventProcessor:thread-1 ERROR Unable to write to stream /var/atlassian/application-data/bitbucket/analytics-logs/4bf3c3d2538000f525a05d096b011ad0.11a7eaa7c55bcb6a00e072075c02dd06.atlassian-analytics.log for appender rolling org.apache.logging.log4j.core.appender.AppenderLoggingException: Error writing to stream /var/atlassian/application-data/bitbucket/analytics-logs/4bf3c3d2538000f525a05d096b011ad0.11a7eaa7c55bcb6a00e072075c02dd06.atlassian-analytics.log
at org.apache.logging.log4j.core.appender.OutputStreamManager.writeToDestination(OutputStreamManager.java:252)
...
Caused by: java.io.IOException: No space left on device
at java.io.FileOutputStream.writeBytes(Native Method)
at java.io.FileOutputStream.write(FileOutputStream.java:326)
at org.apache.logging.log4j.cor20:07:30.372 [main] INFO com.atlassian.security.serialfilter.DeserializationFilterConfigurator - Global serial filter set to JDK 11 DeserializationFilter
WARNING: sun.reflect.Reflection.getCallerClass is not supported. This will impact performance.
ANTLR Tool version 4.5.3 used for code generation does not match the current runtime version 4.6ANTLR Runtime version 4.5.3 used for parser compilation does not match the current runtime version 4.6ANTLR Tool version 4.5.3 used for code generation does not match the current runtime version 4.6ANTLR Runtime version 4.5.3 used for parser compilation does not match the current runtime version 4.6[194741.694s][error][jfr,system] Failed to write to jfr stream because no space left on device
[194741.695s][error][jfr,system] An irrecoverable error in Jfr. Shutting down VM...
The launcher.log does not log the date and time of events but will generally refer to recent events.
Cause
- Java Flight Recorder (JFR) files fill the default jfr logs location <BITBUCKET_HOME>/log/jfr, which makes the JVM to stall.
Solution
- The ultimate solution is to analyse the JFR files using a tool such as JDK mission control to understand why the files are being heavily logged.
Workarounds
- Ensuring that there is sufficient disk space for the JFR files. The space required for the recording is calculated according to the following formula: jfr.recording.max_size * jfr.recording.files_to_remain.
- Changing the location of the JFR files to a bigger data store such as an NFS storage.
- Changing the size limit, count of the JFR files or duration of their storage as per the JFR diagnostics guide.
- As a temporary workaround, JFR logging can be disabled in the troubleshooting and support tools - diagnostic settings view.