How To: Check your repository's size and identify large files
Platform Notice: Cloud - This article applies to Atlassian products on the cloud platform.
Summary
Bitbucket Cloud enforces a 4GB limit on repositories' size. After exceeding the 4GB limit, repositories are set to read-only, which could cause a blockage on development flows. In order to avoid this issue, a few steps can be followed to figure out which files are taking more space and troubleshoot why a repository size is increased.
Environment
The steps outlined in this article are applicable for any installation of Bitbucket (Cloud, Server, and Data Center) using Git for source control versioning.
Solution
Discover the large files in a Git repo
The following command will list the top largest files reachable from your repository's HEAD. The output will include the blob's hash ID, the size in bytes, and the respective file name.
$ git ls-tree -r --long HEAD | sort -k 4 -n -r | less
100644 blob 557db03de997c86a4a028e1ebd3a1ceb225be238 12 test.txt
100644 blob 5029834def1b27d2f2107b51aac14fe5f75d9da0 127 test.backup
100644 blob 5029834def1b27d2f2107b51aac14fe5f75d9da0 127 test.sql
100644 blob 879b112cd96d01f605d0e380e0c9c00bfd2eb83a 127 jira.txt
100644 blob 9b5b369768594badbad98f2566a00e35ef61e14f 592 .gitattributes
100644 blob 25b6ebb7bfc76200ba96bee52cae9cb49113bef4 6122845 hugeFile.png
Discover commits with the large file
From the output above, you can find all of the commit hashes that contain the blob. You will need the path, in this case the current directory, in this case ./, and the hash.
Below example shows the commits in which the provided blob is part of, If you want to identify the commits in which the above blob "25b6ebb7bfc76200ba96bee52cae9cb49113bef4", the below command can be used.
$ git log --all --pretty=format:%H -- ./ | \
xargs -I% sh -c "git ls-tree % -- ./ |
grep -q 25b6ebb7bfc76200ba96bee52cae9cb49113bef4 && echo %"
a1a23cca2e2d379c1b8162c536f8753fad0bd1ae <output showing the commit>
$
From the above output, the blob 25b6ebb7bfc76200ba96bee52cae9cb49113bef4 is part of the commit - a1a23cca2e2d379c1b8162c536f8753fad0bd1ae
Find branches that contain a commit
This lets you find the branches that are affected by the large file. If the file is only in one or two branches, these branches can be deleted to remove the large file. Otherwise, please see reduce your repository size.
$ git branch -a --contains a1a23cca2e2d379c1b8162c536f8753fad0bd1ae
* main
test
$
List the total size of HEAD
This command will use the output of git ls-tree to sum the total size of all files reachable from the repository's head. The output represents the total sum in bytes.
$ git ls-tree -r --long HEAD | awk '{sum+=$4} END {print sum}'
7833793
Check the repo’s size and the number of objects
Using the command git count-objects, we can see the total repository size and how many objects are being used to calculate that size. With the below output, we can confirm that the local repository's current size is 15.27MB and there is a total of 3 objects.
$ git count-objects -vH
count: 3
size: 15.72 MiB
in-pack: 0
packs: 0
size-pack: 0 bytes
prune-packable: 0
garbage: 0
size-garbage: 0 bytes
This command calculates the repository size based on the objects contained within the local clone. For a more accurate size calculation matching the size seen on the remote, it's advisable to run this command on a mirror clone.
After you have identified what has caused the repository size to increase, you can follow the appropriate steps to reduce your repository size.