How to perform manual garbage collection on a repository in Bitbucket Server

Still need help?

The Atlassian Community is here for you.

Ask the community

Platform notice: Server and Data Center only. This article only applies to Atlassian products on the Server and Data Center platforms.

Support for Server* products ended on February 15th 2024. If you are running a Server product, you can visit the Atlassian Server end of support announcement to review your migration options.

*Except Fisheye and Crucible

Purpose

Bitbucket runs a garbage collection when needed, this should never be performed manually on the repositories to avoid any data loss.

The best course of action is to allow Bitbucket to run it instead of running it manually.

This page covers the steps required to allow Bitbucket to successfully run the garbage collection.

Solution

For Bitbucket 5+

Bitbucket implements its own garbage collection logic without relying on git gc anymore (this is achieved by setting the [gc] auto = 0 on all repositories). When a fork is created, the pruneexpire=never is added to the git configuration and this is removed when the last fork is deleted.

For Bitbucket < 5.0

Bitbucket server relies on git running auto gc on push. That doesn't necessarily mean git gc will actually run, git uses a heuristic to decide whether gc is necessary (the repository has either 6700 loose objects or 50 pack files. The number of loose objects is estimated by counting how many objects are in objects/17).

For repositories with forks, the git auto garbage collection is disabled by setting the gc.auto 0 configuration option as soon as the first fork is created. This setting is then removed, reenabling the auto gc as soon as the last fork is removed.

More information on this subject

Check for the existence of forks

Since the garbage collection can be performed only if there are no forks (to avoid necessary data from being removed), the first step is to check if there are any forks for a repository.

This can be achieved by running the following REST API (documented here):

curl --user <username>:<password> -H "Content-Type: application/json" -X GET <bitbucket_url>/rest/api/1.0/projects/<project_key>/repos/<repository_slug>/forks > ./rest_output.txt

The following is the result when no forks are available from the repository:

{"size":0,"limit":25,"isLastPage":true,"values":[],"start":0}

If there are forks on the repository, no garbage collection (git gc or git prune) should ever be run to avoid any data loss.

Forks cannot be removed but I still want to run a repack

In production, the following steps can require some time. For this reason, it is recommended to check the potential gain on a copy of the repository first.

Also keep track of the time required to perform the full sequence of steps as the user that runs the Bitbucket process.

cd <repository path in the Bitbucket home directory>
cp -r * some/tmp/location
cd some/tmp/location
du -h
git fsck
git repack -adfln --keep-unreachable --depth=20 --window=200
du -h

If the gain is significant, plan for the required downtime and proceed with the next steps.

Another mechanism to check the gain is by running (before and after the repack) the following command:

git count-objects -v


Perform the repack on the repository itself

  • Generate a backup of Bitbucket Server (Data recovery and backups)
  • Stop Bitbucket Server
  • Run the following commands as the user that runs the Bitbucket process:

    cd <repository path in the Bitbucket home directory>
    du -h
    git fsck
    git repack -adfln --keep-unreachable --depth=20 --window=200
    du -h
  • Start Bitbucket Server

The fsck command sees the status of the source repository. It checks for file consistency and integrity in the system and outputs the results.

The repack command will clean up loose objects and compact the data.


If you are not able to stop the instance before you run the repack in production, do the following:

  • touch app-info/gc.log.lock
  • then run repack
  • then rm app-info/gc.log.lock

This will ensure Bitbucket Server does not attempt a repack during the repack.

Description

This page covers the preliminary checks to understand if a garbage collection is a viable option on a repository, and the steps to perform it.

ProductBitbucket
Last modified on Aug 31, 2022

Was this helpful?

Yes
No
Provide feedback about this article
Powered by Confluence and Scroll Viewport.