Reduce repository size
Bitbucket repository size limits
To provide the best and fastest service for all our users we have the following repository size limits.
- Soft limit 1 GB: At this point we let you know you're getting to the higher end of an effective repository size and you might want to perform maintenance to keep from hitting the hard limit.
- Hard limit 2 GB: This is essentially a repository size stop sign we'll limit what you can do until you reduce your repository size.
The procedures on this page are designed to help you do the following:
- First you'll want to identify the size of your Git or Mercurial repository both locally and in Bitbucket.
- Then you'll want to find large files and other large objects while getting a better view of your total repository size locally.
- Finally, you'll want to remove files, branches, and rewrite history to remove previous references to those file.
Find your Bitbucket repository size
To check the relative size of your repository in Bitbucket click Settings (1), which opens the Repository details (2) page, then look for the Size (3) line.
Local Git repository size from the command line
For Git, you can use the following command:
This should return a result similar to this:
The size-pack value is the size of your repository when it is pushed to a remote server like Bitbucket. The size-pack value is in kilobytes. So, in the above example the repository is not even 1 MB.
Local Mercurial repository size from the command line
Mercurial does not provide a command specifically for find a repository repository size. You can use the bundle command to generate a compression of your repository and then see the size of the file:
Understand file removal in a Git repository
Rewriting repository history is a tricky business, because every commit depends on it's parents, so any small change will change the commit id of every subsequent commit. There are two automated tools for doing this:
- the BFG Repo Cleaner - fast, simple, easy to use. Require Java 6 or above.
- git filter-branch - powerful, tricky to configure, slow on big repositories. Part of the core Git suite.
Remember, after you rewrite the history, whether you use the BFG or filter-branch, you will need to remove
reflog entries that point to old history, and finally run the garbage collector to purge the old data.
How Git history rewrite works
Cloning a repository clones the entire history — including every version of every source code file. If a user commits a huge file, such as a JAR, every clone thereafter includes this file. Even if a user ends up removing the file from the project with a subsequent commit, the file still exists in the repository history. To remove this file from your repository you must:
- remove the file from your project's current file-tree
- remove the file from repository history - rewriting Git history, deleting the file from all commits containing it
- remove all reflog history that refers to the old commit history
- repack the repository, garbage-collecting the now-unused data using git gc
Git 'gc' (garbage collection) will remove all data from the repository that is not actually used, or in some way referenced, by any of your branches or tags. In order for that to be useful, we need to rewrite all Git repository history that contained the unwanted file, so that it no longer references it - git gc will then be able to discard the now-unused data.
Using the BFG to rewrite history
The BFG is specifically designed for removing unwanted data like big files or passwords from Git repos, so it has a simple flag that will remove any large historical (not-in-your-current-commit) files: '--strip-blobs-bigger-than'
Any files over 100MB in size (that aren't in your latest commit - because your latest content is protected by the BFG) will be removed from your Git repository's history. If you'd like to specify files by name, you can do that too:
Alternatively, using git filter-branch to rewrite history
filter-branch command rewrites a Git repo's revision history, just like the BFG, but the process is slower and more manual. If you don't know where the big file is, your first step will be to find it using one of the two following options:
Manually reviewing large files in your repository
Antony Stubbs has written a BASH script that does this very well. The script examines the contents of your packfile and lists out the large files. Before you begin removing files, do the following to obtain and install this script:
- Download the script to your local system.
- Put it in a well known location accessible to your Git repository.
Make the script an executable:
- Clone the repository to your local system.
- Change directory to your repository root.
Run the Git garbage collector manually.
Find out the size of the .git folder.
Note this size down for later reference.
List the big files in your repo by running the
The big files are all JAR files. The pack size column is the most relevant. The
aui-dependencies.jarcompacts to 169KB but the
emojis.jarcompacts only to 580. The
emojis.jaris a candidate for removal.
filter-branch command can contain task specific filters for rewriting the Git index. For example, a filter can remove a file from every indexed commit. The syntax for this is the following:
--index-filteroption modifies a repo's staging (or index).
--cachedoption removes a file from the index not the disk. This is faster as you don't have to checkout each revision before running the filter.
- The -
git rmprevents the command from failing if the pathname it is trying to remove isn't there.
- By specifying a commit HASH, you remove the
pathnamefrom every commit starting with the HASH on up. To remove from the start, leave this off or you can specify HEAD.
If all your large files are in different branches, you'll need to delete each file by name. If all the files are within a single branch, you can delete the branch itself.
Delete files by name
Use the following procedure to remove large files:
Run the following command to remove the first large file you identified:
Repeat Step 1 for each remaining large file.
Update the references in your repository.
filter-branchcreates backups of your original refs namespaced under
refs/original/. Once you're confident that you deleted the correct files, you can run the following command to delete the backed up refs, allowing the large objects to be garbage collected:
Delete just the branch
If all your large files are on a single branch, you can just delete the branch. Deleting the branch automatically removes all the references.
Delete the branch.
Prune all of the reflog references from the branch on back.
Garbage collecting dead data
Prune all of the reflog references from now on back (unless you're explicitly only operating on one branch).
Repack the repository by running the garbage collector and pruning old objects.
Push all your changes back to the Bitbucket repository.
Make sure all your tags are current too:
How to reduce a Mercurial repository
You should do regular maintenance of your Mercurial repository to reduce its size. If you imported code from another version control system, you may need to clean up unnecessary files after the import. This explains how to reduce repository size by removing large files from a Mercurial repository.
Another technique to reduce repository size is to split the repository into multiple smaller ones. This could be run for each directory in the repository, which will create sub-repositories for each one. To learn more about splitting repositories, see Split a repository in two.
How to find large files
Large files are typically things like third-party libraries (jars, dlls), compiled versions of your applications, and binary media assets (such as image files). Keep in mind that Mercurial usually saves differences between files. A small change to a binary file's content can cause many or most of the file's bytes to change. Committing a change to binary files potentially causes Mercurial to store the entire or most of a large file multiple times.
In a Linux environment
To find large files in a Linux environment, use the following piped command:
This command ignores hidden files and directories. For example, the command ignores everything in the the
.hg directory. It sorts the output of the
ls by size and uses the
head command to return the ten largest files. For example, right now the Bitbucket tutorial repo has these large files:
In a Windows environment
In a Windows environment, we recommend using PowerShell. To open PowerShell in Windows 7, do the following:
- Click the Start button.
- Begin typing
Powershellin the Search programs and files field.
- Select the Windows PowerShell option.
The Powershell command window opens.
- Change to the root of your repository.
Enter the following at the command prompt:
This command lists all the repo's files excluding those in the
.hg(metadata) directory. The system lists output similar to the following:
Removing large files
To remove large files, you use the
convert extension and
--filemap option. The
convert extension filters a repository and creates a new repository with a parallel history. The
--filemap option takes a
filemap file that specifies filters for file processing. During the conversion process the convert extension uses the
filemap to modify the changesets it processes. You can use the filemap to include, rename, or exclude individual files or whole directories.
For example, the following shows the a simple set of
For the detailed information about the
convert extension and the
--filemap option, see the
ConvertExtension --filemap documentation.
Example of using convert to reduce repository files
Consider the following directory structure in a repository:
To remove all libraries except
commons-lang and retain all Javadoc except the one for
commons-io you create the following
filemap.txt file in the repository root:
Then, to convert the structure, you would issue the following command at the command line:
initialHgRepo is the repository to convert and the
hgRepoAfterConversion is the new repository. After the conversion,
hgRepoAfterConversion repository structure is:
You can then check in the
Maintain repository size by deleting files
Deleting unused files is a good way to reduce repository size. Remember deleting a file does not remove it from history. It does reduce your repository at is current version. Every clone still contains the old file if you need to retrieve it. When looking at files to delete, consider the following:
- SQL dumps
- Large media assets
- Compiled versions of your applications
- Third party libraries and dependencies (jars, dlls, gem, etc.)
- Completely unneeded large files in history
Binary files are always candidates. DVCS systems are not good candidates for storing binary files. Consider hosting these on file hosting service such as Google Drive, Dropbox, or Carbonite.
Split a repository into project repositories
You can reduce your repository size by splitting a repository by code projects. This requires that you understand how your projects references each other. For example, if you have four projects that don't reference each other in one large repository, you split these up into four smaller repositories.
For a procedure on splitting a repository, see Split a repository in two.
Was this helpful?
Thanks for your feedback!