How stored Git caches speed up builds

Still need help?

The Atlassian Community is here for you.

Ask the community

Platform notice: Server and Data Center only. This article only applies to Atlassian products on the Server and Data Center platforms.

Support for Server* products ended on February 15th 2024. If you are running a Server product, you can visit the Atlassian Server end of support announcement to review your migration options.

*Except Fisheye and Crucible

Summary

This article explains how Bamboo caches git repositories and the benefits of using it

Scenario

Use case example:

We have Bitbucket connected with Bamboo. We are planning to set up another Bamboo instance in our European office (We are in the US) but our Stash (Git) server is also located in the US. Doing Git clones or pulls from Europe to the US will take an exponential amount of time. If the repository is cached on the Bamboo server or Agents, builds would become faster because the Git/Bitbucket repository lives in a different part of the world. Where does Bamboo store the repository cache? If we are running elastic agents is the code still cached on the server somewhere after the agent expires? 

Solution

Git caches in Bamboo are stored under:

Bamboo Server 7.x and earlier:

  • <bamboo-home>/xml-data/build-dir/_git-repositories-cache

Bamboo Server and Data Center 8.x and later:

  • <bamboo-home>/local-working-dir/_git-repositories-cache

Bamboo Remote and Elastic Agents:

  • <bamboo-agent-home>/xml-data/build-dir/_git-repositories-cache

You can find the exact name and location of your repository cache by navigating to Bamboo Administration → Overview → Repository Settings. These caches are removed manually by the Administrator or automatically by Bamboo in case of errors so it can start over with a new cache. Expiry settings at both the Global and Plan levels affect artifacts and data under the build directory (read more about locating important directories and files).

Repository caching means that code is first fetched from the repository to a local cache and then another fetch is made from this cache to your plan's workspace that is located under the plan's working directory. Repository caching happens automatically on the Bamboo Server for its internal operations and its Local Agents and is enabled by default on Remote and Elastic Agents (it can be disabled). When using a repository cache, on the first run of a plan, Bamboo performs a full clone and stores the data in a local cache directory and completes the build. On subsequent builds, Bamboo does a git fetch from the remote repository to see if there are additional changes and if so, updates the local cache. Similar to the first run, the data for the plan is then checked out from the local cache. Hence, a faster checkout. To get a better understanding, you can enable "verbose logs" located under the advanced settings of the repository, trigger a build, and follow the sequence of events.

Depending on the type of workload such as in large repositories; multiple plans using the local cache and producing a disk bottleneck, or also on very active repositories whose Build plans have a requirement to be linked to the latest changes, etc, it may be necessary to not use any caching and instead use the latest version from the repository. You can disable the repository caching on agents by unsetting the "Enable repository caching on agents" option on the Repository configuration.

The git cache is used even when different plans checkout the same repository to prevent duplicate and unnecessary checkouts of the same repository between plans. 

Git submodules are also cached, currently, submodules are fully cloned, meaning that if you have large repositories being referenced as submodules, your git cache might grow considerably. There is a Feature Request to address this limitation and implement shallow cloning on submodules:  BAM-14706 - Getting issue details... STATUS

If you use Elastic Agents (EC2), all the cache data is removed when the agents expire. However, in most cases, the EC2 agents live long enough to run several builds so the available cache still improves performance for the duration of time that the agents are alive.






Last modified on Mar 29, 2022

Was this helpful?

Yes
No
Provide feedback about this article
Powered by Confluence and Scroll Viewport.