How stored Git caches speed up builds
Platform notice: Server and Data Center only. This article only applies to Atlassian products on the Server and Data Center platforms.
Support for Server* products ended on February 15th 2024. If you are running a Server product, you can visit the Atlassian Server end of support announcement to review your migration options.
*Except Fisheye and Crucible
Summary
This article explains how Bamboo caches git repositories and the benefits of using it
Scenario
Use case example:
We have Bitbucket connected with Bamboo. We are planning to set up another Bamboo instance in our European office (We are in the US) but our Stash (Git) server is also located in the US. Doing Git clones or pulls from Europe to the US will take an exponential amount of time. If the repository is cached on the Bamboo server or Agents, builds would become faster because the Git/Bitbucket repository lives in a different part of the world. Where does Bamboo store the repository cache? If we are running elastic agents is the code still cached on the server somewhere after the agent expires?
Solution
Git caches in Bamboo are stored under:
Bamboo Data Center:
<bamboo-home>/local-working-dir/_git-repositories-cache
Bamboo Remote and Elastic Agents:
<bamboo-agent-home>/xml-data/build-dir/_git-repositories-cache
Location
You can find the exact name and location of your repository cache by navigating to Bamboo Administration → Overview → Repository Settings. These caches are removed manually by the Administrator or automatically by Bamboo in case of errors so it can start over with a new cache. Expiry settings at both the Global and Plan levels affect artifacts and data under the build directory
(read more about locating important directories and files).
Benefit
Repository caching means that code is first fetched from the repository to a local cache and then another fetch is made from this cache to your plan's workspace that is located under the plan's working directory. Repository caching happens automatically on the Bamboo Server for its internal operations and is enabled by default on Remote and Elastic Agents (it can be disabled). When using a repository cache, on the first run of a plan, Bamboo performs a full clone, stores the data in a local cache directory and completes the build. On subsequent builds, Bamboo does a git fetch
from the remote repository to see if there are additional changes and if so, updates the local cache. Similar to the first run, the data for the plan is then checked out from the local cache. Hence, a faster checkout. To get a better understanding, you can enable "verbose logs" located under the advanced settings of the repository, trigger a build, and follow the sequence of events.
The Git cache is used even when different plans checkout the same repository to prevent duplicate and unnecessary checkouts of the same repository between plans.
When to not use it
Depending on the type of workload, such as in large repositories, where multiple plans use the local cache and produce a disk bottleneck, or on very active repositories whose Build plans require being linked to the latest changes, etc., it may be necessary not to use any caching and instead use the latest version from the repository. You can disable the repository caching on agents by unsetting the "Repository caching on Agents" option on the Repository configuration.
Git submodules are also cached, currently, submodules are fully cloned, meaning that if you have large repositories being referenced as submodules, your git cache might grow considerably. There is a Feature Request to address this limitation and implement shallow cloning on submodules: BAM-14706 - Getting issue details... STATUS
If you use Elastic Agents (EC2), all the cache data is removed when the agents expire. However, in most cases, the EC2 agents live long enough to run several builds, so the available cache still improves performance for the duration of their lives.