Unexpected Bitbucket Data Center crashes with Linux virtual memory overcommit turned off
Platform Notice: Data Center - This article applies to Atlassian products on the Data Center platform.
Note that this knowledge base article was created for the Data Center version of the product. Data Center knowledge base articles for non-Data Center-specific features may also work for Server versions of the product, however they have not been tested. Support for Server* products ended on February 15th 2024. If you are running a Server product, you can visit the Atlassian Server end of support announcement to review your migration options.
*Except Fisheye and Crucible
Summary
In some situations, Bitbucket may crash with memory-related errors reported, but without obvious reason. Even with a large amount of RAM and a huge Java heap, there could be cases of Java logging "OutOfMemoryException". On other occasions, the whole Java virtual machine (JVM) crashes, leaving behind the "hs_err" files.
A real-life example was a Bitbucket setup running on a server with 64 GB RAM, with an ever-increasing Java heap size due to various memory-related errors over the years. The heap was raised from "Xms1GB Xmx8G" through "Xms1G Xmx24G" to "Xms24G Xmx32G." Even with those huge sizes, heap-related errors were logged. There was no obvious explanation of what was going on. There were also reports of "git repack" commands crashing with "fatal: Out of memory, realloc failed
" errors. Increasing RAM from 64GB to 96GB didn't help.
Environment
The solution has been validated in Bitbucket 8.19.15 but may be applicable to other versions.
Diagnosis
Steps to diagnose this issue include:
- When Xms and Xmx are not set to the same value and Java VM tries to increase the heap, it fails, and JVM crashes. The "
hs_err
" log is left behind. - The Linux operating system does not show errors like
kernel: Out of memory: Kill process ...
and there is no trace of processes being killed by the kernel.
Yet, there may be "git" processes dying which are registered in Bitbucket logs, likeCommandFailedException: [git repack -a -d -l -n --keep-unreachable] exited with code 128 saying: fatal: Out of memory, realloc failed
- The number of memory maps, Linux kernel's parameter
vm.max_map_count
has been increased from the default value of 65536, but the problems are still present. - Linux kernel's virtual memory overcommit parameter has been set to
vm.overcommit_memory = 2
.
Usesysctl -a | grep overcommit
to gather the value of this parameter. - The "
hs_err
" logs with Java VM crash reports show reports similar to this one:# # There is insufficient memory for the Java Runtime Environment to continue. # Native memory allocation (mmap) failed to map 7549747200 bytes. Error detail: committing reserved memory. # Possible reasons: # The system is out of physical RAM or swap space # Possible solutions: # Reduce memory load on the system # Increase physical memory or swap space # Check if swap backing store is full # Decrease Java heap size (-Xmx/-Xms) # Decrease number of Java threads # Decrease Java thread stack sizes (-Xss) # Set larger code cache with -XX:ReservedCodeCacheSize=
Cause
The problem is the Linux kernel's vm.overcommit_memory = 2
setting, it turns off Linux virtual memory overcommit. Relevant information with excerpts are given on the links below:
- Linux kernel overcommit accounting: vm.overcommit_memory=2: Don't overcommit. The total address space commit for the system is not permitted to exceed swap + a configurable amount (default is 50%) of physical RAM. Depending on the amount you use, in most situations this means a process will not be killed while accessing pages but will receive errors on memory allocation as appropriate.
- https://unix.stackexchange.com/a/441410: The reason for overcommitting is to avoid underutilization of physical RAM. [...] I don't deny that overcommitting memory has its dangers, and can lead to out-of-memory situations that are messy to deal with. It's all about finding the right compromise.
- https://unix.stackexchange.com/a/441508: Real cases scenario where overcommitting makes a lot of sense is when a program which uses a lot of virtual memory does a fork followed by an exec. Let's say you have 4 GB of RAM from which 3 GB are available for virtual memory and 4 GB of swap. There is a process reserving 4 GB but only using 1 GB out of it. There is no pagination so the system performs well. On a non-overcommiting OS, that process cannot fork because just after the fork, 4 GB more of virtual memory need to be reserved, and there is only 3 GB left.
- The consequence in Bitbucket's case is that the Mesh sidecar which runs with a JVM heap of 768MB will need additional 768MB of virtual memory for every Git operation started, because it uses "fork+exec" to launch Git operations. If Bitbucket's Mesh sidecar starts 10 Git operations in a short period, it will need 7GB of RAM just to start them!
In the case of the example Bitbucket setup with a huge JVM heap and collocated PostgreSQL database, the problem introduced by PostgreSQL was that the Linux setup was tuned for it:
- To help with memory handling, PostgreSQL recommends disabling memory overcommit: PostgreSQL - 18.4.4. Linux Memory Overcommit.
- Possible consequences of mixing PostgreSQL (no overcommit) and Java (lilkes overcommit) on the same Linux system is given on this Stack Overflow page https://stackoverflow.com/questions/77357897/native-memory-allocation-mmap-failed-despite-having-enough-available-memory: There is insufficient memory for the Java Runtime Environment to continue.
Solution
The solution is to revert Linux kernel virtual memory overcommit settings to its default value, enabled:
vm.overcommit_memory = 0
If Java heap is large it should be reduced. In the example Bitbucket setup, it was reduced from 32GB to 4GB without issues.
- To change Java heap for Bitbucket Data Center, follow the guide on the page Set heap size for Java on Bitbucket Data Center