How stored Git caches speed up Bamboo builds
Platform Notice: Data Center Only - This article only applies to Atlassian products on the Data Center platform.
Note that this KB was created for the Data Center version of the product. Data Center KBs for non-Data-Center-specific features may also work for Server versions of the product, however they have not been tested. Support for Server* products ended on February 15th 2024. If you are running a Server product, you can visit the Atlassian Server end of support announcement to review your migration options.
*Except Fisheye and Crucible
Summary
This article explains how Bamboo caches git repositories and the benefits of using it
Use case example
We have Bitbucket connected with Bamboo. We are planning to set up another Bamboo instance in our European office (We are in the US) but our Stash (Git) server is also located in the US. Doing Git clones or pulls from Europe to the US will take an exponential amount of time. If the repository is cached on the Bamboo server or Agents, builds would become faster because the Git/Bitbucket repository lives in a different part of the world. Where does Bamboo store the repository cache? If we are running elastic agents is the code still cached on the server somewhere after the agent expires?
Solution
Git caches in Bamboo are stored under:
Bamboo Data Center:
<bamboo-home>/local-working-dir/_git-repositories-cache
Bamboo Remote and Elastic Agents:
<bamboo-agent-home>/xml-data/build-dir/_git-repositories-cache
Location
You can find the exact name and location of your repository cache by navigating to Bamboo Administration → Overview → Repository Settings. These caches are removed manually by the Administrator or automatically by Bamboo in case of errors so it can start over with a new cache. Expiry settings at both the Global and Plan levels affect artifacts and data under the build directory
(read more about locating important directories and files).
Benefit
Repository caching means that code is first fetched from the repository to a local cache and then another fetch is made from this cache to your plan's workspace that is located under the plan's working directory. Repository caching happens automatically on the Bamboo Server for its internal operations and is enabled by default on Remote and Elastic Agents (it can be disabled). When using a repository cache, on the first run of a plan, Bamboo performs a full clone, stores the data in a local cache directory and completes the build. On subsequent builds, Bamboo does a git fetch
from the remote repository to see if there are additional changes and if so, updates the local cache. Similar to the first run, the data for the plan is then checked out from the local cache. Hence, a faster checkout. To get a better understanding, you can enable "verbose logs" located under the advanced settings of the repository, trigger a build, and follow the sequence of events.
The Git cache is used even when different plans checkout the same repository to prevent duplicate and unnecessary checkouts of the same repository between plans.
When to not use it
Depending on the type of workload, such as in large repositories, where multiple plans use the local cache and produce a disk bottleneck, or on very active repositories whose Build plans require being linked to the latest changes, etc., it may be necessary not to use any caching and instead use the latest version from the repository. You can disable the repository caching on agents by unsetting the "Repository caching on Agents" option on the Repository configuration.
Git submodules are cached by default. Up until Bamboo 10.2, submodules were unable to use shallow cloning. This limitation meant that if you referenced large repositories as submodules, your Git cache could grow significantly. Starting from Bamboo 11.0, submodules can take advantage of shallow clones, which helps save space in the Git cache location. For more information, please refer to the following feature request:
If you use Elastic Agents (EC2), all the cache data is removed when the agents expire. However, in most cases, the EC2 agents live long enough to run several builds, so the available cache still improves performance for the duration of their lives.
Was this helpful?