"git repack" uses a lot of memory producing "Out of memory: Killed process .... " Linux OOM messages with Bitbucket Data Center
Platform Notice: Data Center Only - This article only applies to Atlassian products on the Data Center platform.
Note that this KB was created for the Data Center version of the product. Data Center KBs for non-Data-Center-specific features may also work for Server versions of the product, however they have not been tested. Support for Server* products ended on February 15th 2024. If you are running a Server product, you can visit the Atlassian Server end of support announcement to review your migration options.
*Except Fisheye and Crucible
Summary
"Git" processes are regularly started by Bitbucket's Mesh sidecar process or remote Mesh nodes. Those external "git" processes execute all git-related repository operations; some can allocate large amounts of RAM. The allocated memory can sometimes be so huge that Linux's out-of-memory killer kills some of the running processes. Which process will be killed depends mainly on the memory allocated by processes - usually, Linux selects the process taking the largest part of the RAM. Sometimes it is Bitucket's Java process, sometimes it is one of the "git" processes. When the Linux OOM condition appears, Linux system logs and output of the dmesg
command will show messages like "Out of memory: Killed process .... (...)".
This article describes the case when large amounts of memory are allocated by one or more git repack
processes being run and explains how to limit the memory the git repack
will use.
Environment
Bitbucket 8.19.0, but is also applicable to other versions.
Diagnosis
There may be multiple reasons why the Bitbucket setup may allocate more RAM than we anticipated.
To identify if the git repack
is the one using huge amounts of memory, we should look for several signs; not necessarily all of the signs will be present:
- Linux out-of-memory events were recorded in system log files, and processes were killed. The killed processes may vary - Java, git, or something else.
The list of the processes running before the Linux OOM actually killed some of them shows at least one git process that allocated a huge amount of RAM - from several GB to several tens of GB.
Take the Linux system logs and
dmesg
command output and look for the strings "Tasks state (memory values in pages)
" and "oom-kill
".Between those two strings, you will find the list of running processes.
Usually, the process list mentions both "git" and "pack-objects" processes. The column "RSS" shows allocated memory; the numbers are given in allocation pages, which are 4KB in size. So, the allocated " RSS" of 5803369 pages is around 22.14GB.Right below the process list is the information about the killed process.
For example:
[Mon Aug 5 10:58:44 2024] Tasks state (memory values in pages): [Mon Aug 5 10:58:44 2024] [ pid ] uid tgid total_vm rss pgtables_bytes swapents oom_score_adj name... ... [Mon Aug 5 10:58:44 2024] [1187772] 23029 1187772 3548 0 69632 54 0 git [Mon Aug 5 10:58:44 2024] [1187773] 23029 1187773 7664473 5803369 58511360 1354319 0 git [Mon Aug 5 10:58:44 2024] [1193095] 23029 1193095 3548 2 69632 42 0 git [Mon Aug 5 10:58:44 2024] [1193096] 23029 1193096 3192 1 69632 58 0 git-http-backen [Mon Aug 5 10:58:44 2024] [1193099] 23029 1193099 3985 12 73728 193 0 git [Mon Aug 5 10:58:44 2024] [1193110] 23029 1193110 184 4 36864 26 0 pack-objects [Mon Aug 5 10:58:44 2024] [1193111] 23029 1193111 151791 588 139264 157 0 git [Mon Aug 5 10:58:44 2024] [1194031] 23029 1194031 65470 181 81920 4 0 git [Mon Aug 5 10:58:44 2024] [1194034] 23029 1194034 184 27 40960 0 0 pack-objects [Mon Aug 5 10:58:44 2024] [1194035] 23029 1194035 816704 27502 3485696 191 0 git [Mon Aug 5 10:58:44 2024] [1194707] 23029 1194707 509776 131 487424 0 0 git [Mon Aug 5 10:58:44 2024] [1194934] 23029 1194934 3548 44 73728 0 0 git [Mon Aug 5 10:58:44 2024] [1194936] 23029 1194936 3192 59 73728 0 0 git-http-backen [Mon Aug 5 10:58:44 2024] [1194937] 23029 1194937 4353 224 81920 0 0 git [Mon Aug 5 10:58:44 2024] [1194993] 23029 1194993 509776 132 483328 0 0 git [Mon Aug 5 10:58:44 2024] [1195085] 23029 1195085 509776 131 471040 0 0 git [Mon Aug 5 10:58:44 2024] [1195120] 23029 1195120 184 27 36864 0 0 pack-objects [Mon Aug 5 10:58:44 2024] [1195121] 23029 1195121 151962 662 147456 0 0 git [Mon Aug 5 10:58:44 2024] [1195436] 23029 1195436 3548 44 77824 0 0 git [Mon Aug 5 10:58:44 2024] [1195441] 23029 1195441 3212 75 65536 0 0 git-http-backen [Mon Aug 5 10:58:44 2024] [1195444] 23029 1195444 5823 332 81920 0 0 git [Mon Aug 5 10:58:44 2024] [1195464] 23029 1195464 3548 45 73728 0 0 git [Mon Aug 5 10:58:44 2024] [1195484] 23029 1195484 3192 60 65536 0 0 git-http-backen [Mon Aug 5 10:58:44 2024] [1195486] 23029 1195486 300616 265 335872 0 0 git [Mon Aug 5 10:58:44 2024] [1195488] 23029 1195488 184 26 40960 0 0 pack-objects ... [Mon Aug 5 10:58:44 2024] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/user.slice/user-23029.slice/user@23029.service,task=git,pid=1187773,uid=23029 [Mon Aug 5 10:58:44 2024] Out of memory: Killed process 1187773 (git) total-vm:30657892kB, anon-rss:23213476kB, file-rss:0kB, shmem-rss:0kB, UID:23029 pgtables:57140kB oom_score_adj:0 [Mon Aug 5 10:58:47 2024] oom_reaper: reaped process 1187773 (git), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB [Mon Aug 5 11:01:59 2024] ENFORCEMENT WARNING: [port_table] failed to malloc a new entry [Mon Aug 5 11:05:09 2024] agent-linux-amd invoked oom-killer: gfp_mask=0x6200ca(GFP_HIGHUSER_MOVABLE), order=0, oom_score_adj=0
- In this example, the "git" process with PID 1187773 was killed ("
Killed process 1187773
"), and at the time of killing it, had around 22GB RAM allocated ("anon-rss:23213476kB
"). - The existence of the "git" process with that massive amount of RAM allocated, irrespective of whether it was killed or not, is an indication of "git repack" using excessive amounts of memory.
- Even if there are no Linux OOM killings, you may observe unusually large RAM usage, either by looking at the output of the
top
command orps xuaOr
command - the last one shows a list of running processes sorted by allocated memory ("RSS"). The Bitbucket's Mesh sidecar log, the file
BITBUCKET_HOME/mesh/log/atlassian-mesh.log
, contains information on "git repack
" commands failing.
If the Linux out-of-memory killer from point 2 above decided to kill the "git repack" process because it allocated too much RAM, the result of this killing will be recorded in Mesh sidecar's log file as "died of signal 9
", for example:2024-08-04 12:41:41,000 WARN [git-gc:thread-2:ds/0/h/ac9f3261b51bc3c92ae1/r/41725] - c.a.b.m.g.g.DefaultGarbageCollectionManager [ac9f3261b51bc3c92ae1-41725] Abandoning garbage collection after 3 failed attempts com.atlassian.bitbucket.mesh.git.exception.CommandFailedException: [git repack -A -d -l -n --unpack-unreachable=72.hours.ago] exited with code 137 saying: error: pack-objects died of signal 9 at com.atlassian.bitbucket.mesh.git.GitProcessCompletionHandler.onError(GitProcessCompletionHandler.java:222) at com.atlassian.bitbucket.mesh.git.GitProcessCompletionHandler.onComplete(GitProcessCompletionHandler.java:58) at com.atlassian.bitbucket.mesh.process.nu.StdioNuProcessHandler.callCompletionHandler(StdioNuProcessHandler.java:286) at com.atlassian.bitbucket.mesh.process.nu.StdioNuProcessHandler.finish(StdioNuProcessHandler.java:308) at com.atlassian.bitbucket.mesh.process.nu.StdioNuProcessHandler.onExit(StdioNuProcessHandler.java:100) ...
Cause - the explanation
When the server's memory starts to run out, Linux first tries to use swap space as a virtual memory extension. When all virtual memory (physical RAM plus swap space) is filled up, if running processes still try to allocate more RAM, Linux out-of-memory killer triggers. It will choose one of the processes to kill to reclaim memory. It selects the process whose killing will yield the most memory gain, and that is usually the process that allocated the largest part of RAM.
Bitbucket uses external "git" processes for all git-repository-related operations, and one of those is the git repack command, used to combine loose Git objects into more space-efficient, large "pack" files. In case of large Git repositories, "git repack" command can allocate huge amounts of RAM, forcing Linux OOM into action. The result will be unstable system with poor perfomance.
Usually, the Linux system-log OOM events and large memory allocation of Git processes correlate with the entries from the mesh sidecar logs, where there may be a number of cases of "git repack" command crashing while trying to repack loose Git objects.
Having an adequate swap space may help to avoid Linux OOM killings.
However, writing data to swap greatly impacts the server's performance, and it is one of the reasons for slow-downs and large CPU utilization - while Linux is swapping parts of RAM to disk, almost all processes are stopped. As a hint on how to identify swapping, apart from swap space filling up, you can look at the list of running processes with the "top" command - there will be "kswap" or similar processes in the "D" state, and CPU I/O utilization would be high.
Solution
The internal "git repack" algorithm may allocate large amounts of RAM when dealing with large Git repositories, but luckily, we can limit the amount of resources it is allowed to use.
There are git per-repository configuration changes that we can apply to the repositories for which "git repack" uses excessive resources.
Before altering a production system, make sure you have a consistent, up-to-date backup of Git repositories and SQL database data.
Information on backing up Bitbucket is available on the page Data recovery and backups.
The first thing we need is information for which repositories "git repack" uses huge RAM amounts:
In case of crashed "git repack" processes, we can find information in the
BITBUCKET_HOME/mesh/log/atlassian-mesh.log
file. In this example the repository ID is 41725:2024-08-04 12:41:41,000 WARN [git-gc:thread-2:ds/0/h/ac9f3261b51bc3c92ae1/r/41725] - c.a.b.m.g.g.DefaultGarbageCollectionManager [ac9f3261b51bc3c92ae1-41725] Abandoning garbage collection after 3 failed attempts com.atlassian.bitbucket.mesh.git.exception.CommandFailedException: [git repack -A -d -l -n --unpack-unreachable=72.hours.ago] exited with code 137 saying: error: pack-objects died of signal 9
- In the case of still-running "git repack" operations, we can use the
top
andps xuaOr
commands to get the list of running processes and look for the "git" processes that are still running. We can check either the command line itself, or look into the/proc
filesystem to see the git process' working directory and opened files; the paths to check are/proc/<GIT_PROCESS_PID>/cwd
and/proc/<GIT_PROCESS_PID>/fd/
.
Those should be sufficient to determine the Git repository path where the "git repack" is operating.
Step 1: test the solution
First, it is essential to test the configuration we want to implement and verify that the "git repack" process can complete its task. Here is what we recommend to do directly on one of the Bitbucket nodes. Test runs of "git repack" on copies of repositories will add some load to the server, so conduct this during off-peak hours! If you need to alter the configuration of more than one Git repository, it is best to make a copy of only one repository at a time and do the test, to limit additional disk space usage.
Navigate to the repository on disk:
cd $BITBUCKET_HOME/shared/data/repositories
Make a copy of the repository within the same parent location - otherwise, due to the Git configuration files stored, the "git repack" process won’t be the same as the one that Bitbucket Server will run.
The repository ID in this example is 41725.cp -r 41725 41725.copy
Apply the recommended configuration changes:
cd 41725.copy git config pack.threads 8 git config pack.windowMemory 1g
pack.threads
limits the number of threads that the repack process is able to use (which has a default of 14).pack.windowMemory
limits how much memory each thread is able to use (which has a default of “unlimited”).- Lowering both limits will slow down the "git repack" process for that repository, but it will allow the process to be completed without invoking the Linux OOM Killer.
- Remove all more than a few hours old
tmp_pack
files from theobjects/pack
directory. These files may exist as the result of the repacks being killed before they could clean up their temporary files. Enable Git verbose output in the terminal window:
export GIT_TRACE_PACKET=1 export GIT_TRACE=1 export GIT_CURL_VERBOSE=1
Run the "git repack" command and monitor the memory usage, for example using the
top
andps xuaOr
commands:git repack -adfln --keep-unreachable --depth=20 --window=200
Make sure you collect the output generated by this command. It would be beneficial in case further diagnostics are needed.
- Since this operation will be long-running, you may wish to run it inside the
screen
ortmux
environment.
- If this operation completes successfully - without invoking the OOM Killer - you may proceed in the same way with the copies of other repositories that may need tuning.
- After completing all tests, you may remove copies of the canonical repositories - directories we used for testing.
Ensure you are about to remove the copies, not the original directories!
Step 2: Apply the solution
If the test from above successfully completes the "git repack" with reasonable memory consumption, it is safe to apply the configuration changes to the canonical repositories.
The repository ID in this example is the same, 41725.
Navigate to the repository on disk:
cd $BITBUCKET_HOME/shared/data/repositories/41725
Apply configuration changes:
git config pack.threads 8 git config pack.windowMemory 1g
- Remove any old
tmp_pack
files from theobjects/pack
subdirectory. Do not manually invoke the"git repack" command on the canonical repositories!
Allow Bitbucket to do this instead. The next "git push" to the Git repository will trigger the "git repack" process.
After this modification, "git repack" commands run by Bitbucket on configured Git repositories will be resource-limited and should no longer trigger the Linux out-of-memory killer.