How To: Check your repository's size and identify large files

Still need help?

The Atlassian Community is here for you.

Ask the community

Platform Notice: Cloud - This article applies to Atlassian products on the cloud platform.


Summary

Bitbucket Cloud enforces a 4GB limit on repositories' size. After exceeding the 4GB limit, repositories are set to read-only, which could cause a blockage on development flows. In order to avoid this issue, a few steps can be followed to figure out which files are taking more space and troubleshoot why a repository size is increased.

Environment

The steps outlined in this article are applicable for any installation of Bitbucket (Cloud, Server, and Data Center) using Git for source control versioning.

Solution

Discover the large files in a Git repo

The following command will list the top largest files reachable from your repository's HEAD. The output will include the blob's hash ID, the size in bytes, and the respective file name.

$ git ls-tree -r --long HEAD | sort -k 4 -n -r | less
100644 blob 557db03de997c86a4a028e1ebd3a1ceb225be238      12	test.txt
100644 blob 5029834def1b27d2f2107b51aac14fe5f75d9da0     127	test.backup
100644 blob 5029834def1b27d2f2107b51aac14fe5f75d9da0     127	test.sql
100644 blob 879b112cd96d01f605d0e380e0c9c00bfd2eb83a     127	jira.txt
100644 blob 9b5b369768594badbad98f2566a00e35ef61e14f     592	.gitattributes
100644 blob 25b6ebb7bfc76200ba96bee52cae9cb49113bef4 6122845	hugeFile.png

Discover commits with the large file

From the output above, you can find all of the commit hashes that contain the blob.  You will need the path, in this case the current directory, in this case ./, and the hash.

Below example shows the commits in which the provided blob is part of, If you want to identify the commits in which the above blob "25b6ebb7bfc76200ba96bee52cae9cb49113bef4", the below command can be used. 

$ git log --all --pretty=format:%H -- ./ | \
	xargs -I% sh -c "git ls-tree % -- ./ | 
	grep -q 25b6ebb7bfc76200ba96bee52cae9cb49113bef4 && echo %"
a1a23cca2e2d379c1b8162c536f8753fad0bd1ae <output showing the commit>
$

From the above output, the blob 25b6ebb7bfc76200ba96bee52cae9cb49113bef4 is part of the commit - a1a23cca2e2d379c1b8162c536f8753fad0bd1ae

Find branches that contain a commit

This lets you find the branches that are affected by the large file.  If the file is only in one or two branches, these branches can be deleted to remove the large file.  Otherwise, please see reduce your repository size.

$ git branch -a --contains a1a23cca2e2d379c1b8162c536f8753fad0bd1ae
* main
  test
$

List the total size of HEAD

This command will use the output of git ls-tree to sum the total size of all files reachable from the repository's head. The output represents the total sum in bytes.

$ git ls-tree -r --long HEAD | awk '{sum+=$4} END {print sum}'
7833793

Check the repo’s size and the number of objects

Using the command git count-objects, we can see the total repository size and how many objects are being used to calculate that size. With the below output, we can confirm that the local repository's current size is 15.27MB and there is a total of 3 objects.

$ git count-objects -vH
count: 3
size: 15.72 MiB
in-pack: 0
packs: 0
size-pack: 0 bytes
prune-packable: 0
garbage: 0
size-garbage: 0 bytes

(info) This command calculates the repository size based on the objects contained within the local clone. For a more accurate size calculation matching the size seen on the remote, it's advisable to run this command on a mirror clone.

After you have identified what has caused the repository size to increase, you can follow the appropriate steps to reduce your repository size.


Last modified on Dec 4, 2024

Was this helpful?

Yes
No
Provide feedback about this article
Powered by Confluence and Scroll Viewport.