Reduce repository size

Have you hit the 2 GB repository limit?

Take a deep breath and we'll help get you back in working order with the first procedure on this page.

Once you've exceeded the 2 GB limit you can only push commit deletions or reversions, but you can't push new commits.

If you've exceeded the 1GB soft limit or want to reduce your repository's size

If you've arrived here because you've exceeded the 1GB soft limit, or just want to maintain your repo to keep it compact and effective, you can skip the first section. The remainder of this page should help you do the following:

  • Identify the size of your repository on Bitbucket.
  • Understand what impact removing files from Git or Mercurial  has on your repository history.
  • Choose and run a maintenance procedure which helps you achieve the outcome you desire.

 

 

Remove the repository limitation

This procedure will help you remove the push limitation from a repository. However, once you finish these steps, you'll still need to choose one of the maintenance procedures on this page to completely resolve the problem.

Backup your repository before starting! The easiest way to create a backup is to clone your repository using the --mirror flag, and zip the whole clone.

Communicate repository maintenance with your team or repository followers. Make sure everyone knows what you're about to do since you'll be rewriting history. Letting your team know is essential and it's just good manners.

You can complete all these initial steps yourself and push back to Bitbucket.

  1. Pull the latest version of your repository from Bitbucket using the git pull --all command.
  2. Back up your repository locally so that any potential file loss during maintenance is recoverable.
  3. Run the git log command with -n 4  from your terminal. The number after the -n determines the number of commits in the log starting from the most recent commit in your local history.

    $ git log -n 4 

    This will supply the last 4 commits which will look something like this:

    $ git log -n 4
    commit 86833d553529f99aad975539b03edce333bd4108
    Merge: b8b6122 d446c3b
    Author: Dan Stevens [Atlassian] <dstevens@atlassian.com>
    Date:   Fri Dec 4 14:09:41 2015 -0800
    
        Merged in dstevens/review2 (pull request #9)
        
        Adding add-on context and connect descriptor pages to the nav and concepts section
    
    commit d446c3bb361339c2c6aedb85b38c20959da8e345
    Author: Daniel Stevens <dstevens@atlassian.com>
    Date:   Fri Dec 4 13:52:33 2015 -0800
    
        revising scopes and context section to remove bad definition and provide links to complete scopes section.
    
    commit c78869f36fd93bd7ad82d7bdd86f779ef08d3f11
    Author: Daniel Stevens <dstevens@atlassian.com>
    Date:   Wed Dec 2 17:13:41 2015 -0800
    
        changing JSON file to JSON object in introduction
    
    commit c1b0c56b1a7aa69d80581a61c9ac45129efadc4f
    Author: Daniel Stevens <dstevens@atlassian.com>
    Date:   Wed Dec 2 16:28:32 2015 -0800
    
        Fixing definition of account context to reflect the truth about when it is visible
  4. Reset the head of your repository's history using the git reset --hard HEAD~N where N is the number of commits you want to take the head back. In the following example the head would be set back one commit, to the last commit in the repository history:

    Resetting the head this way, then force pushing the change in the next step, will permanently delete all the changes in the commit(s). This is a destructive operation so back up any files you've added before proceeding.

    git reset --hard HEAD~1

  5. Push the change to Bitbucket using git push --f to force push the change.

    git push --f 


    Once you push your changes Bitbucket will automatically run an aggressive git gc (git garbage collection) to rewrite the history and reflect your change. Give a few moments for the system to run the change. If, after thirty min to an hour, we have not removed the limitation you can try one of the following:

    1. Choose one of the methods on this page and do a more complete maintenance on your repository.
    2. Contact support and let them know you've already tried the first section of this page.

To prevent your repository from hitting the hard limit again, and to remove the 1 GB soft limit, you will want to do a complete maintenance cycle on your repository using one or more of the procedures on this page.

Bitbucket repository limits

To provide the best and fastest service for all our users we have the following repository size limits.

  • Soft limit 1 GB: At this point we let you know you're getting to the higher end of an effective repository size and you might want to perform maintenance to keep from hitting the hard limit. 
  • Hard limit 2 GB: This is essentially a repository size stop sign we'll limit what you can do until you reduce your repository size.

If your repository exceeded the 2GB limit

Once you have taken the steps to reduce your local repository, and pushed those changes up to Bitbucket, we'll automatically run an aggressive Git gc to clean things up and get you working again.

Find your Bitbucket repository size

To check the relative size of your repository in Bitbucket click Settings, which opens the Repository details page, then look for the Size line.

Ideally, you should keep your repository size to between 100MB and 300MB. To give you some examples: Git itself is 222MB, Mercurial itself is 64MB, and Apache is 225MB. You can check out these open source repositories here: https://bitbucket.org/mirror/

Git repository size from the command line

You can use the command line to find the size of your repository on your local system.

For Git, you can use the following command:

git count-objects -v 

This should return a result similar to this:

$ git count-objects -v 
count: 0
size: 0
in-pack: 478
packs: 1
size-pack: 92
prune-packable: 0
garbage: 0

The size-pack value is the size of your repository when it is pushed to a remote server like Bitbucket. The size-pack value is in kilobytes.  So, in the above example the repository is not even 1 MB.  

Local Mercurial repository size from the command line

Mercurial does not provide a command specifically for find a repository repository size.  You can use the bundle command to generate a compression of your repository and then see the size (approximately 21.658 MB) of the file as shown in the following example:

$ hg bundle --all my-bundle.hg
2474 changesets found
$ ls -al my-bundle.hg 
-rw-r--r--  1 manthony  staff  21658140 Feb 10 15:03 my-bundle.hg

Understand file removal in a Git repository

Rewriting repository history is a tricky business, because every commit depends on it's parents, so any small change will change the commit id of every subsequent commit. There are two automated tools for doing this:

  1. the BFG Repo Cleaner - fast, simple, easy to use. Require Java 6 or above.
  2. git filter-branch - powerful, tricky to configure, slow on big repositories. Part of the core Git suite.

Remember, after you rewrite the history, whether you use the BFG or filter-branch, you will need to remove reflog entries that point to old history, and finally run the garbage collector to purge the old data.

How Git history rewrite works

Cloning a repository clones the entire history — including every version of every source code file.  If a user commits a huge file, such as a JAR, every clone thereafter includes this file. Even if a user ends up removing the file from the project with a subsequent commit, the file still exists in the repository history.  To remove this file from your repository, and its history, you must:

  • remove the file from your project's current file-tree
  • remove the file from repository history - rewriting Git history, deleting the file from all commits containing it
  • remove all reflog history that refers to the old commit history
  • repack the repository, garbage-collecting the now-unused data using git gc

Git 'gc' (garbage collection) will remove all data from the repository that is not actually used, or in some way referenced, by any of your branches or tags. In order for that to be useful, we need to rewrite all Git repository history that contained the unwanted file, so that it no longer references it - git gc will then be able to discard the now-unused data.

Using the BFG to rewrite history

The BFG is specifically designed for removing unwanted data like big files or passwords from Git repos, so it has a simple flag that will remove any large historical (not-in-your-current-commit) files: '--strip-blobs-bigger-than'

$ java -jar bfg.jar --strip-blobs-bigger-than 100M

Any files over 100MB in size (that aren't in your latest commit - because your latest content is protected by the BFG) will be removed from your Git repository's history. If you'd like to specify files by name, you can do that too:

$ java -jar bfg.jar --delete-files *.mp4

The BFG is 10-1000x faster than git filter-branch, and generally much easier to use - check the full usage instructions and examples for more details.

Alternatively, using git filter-branch to rewrite history

The filter-branch command rewrites a Git repo's revision history, just like the BFG, but the process is slower and more manual. If you don't know where the big file is, your first step will be to find it using one of the two following options:

Then you'll have to decide to delete files one at a time or delete a specific branch. Either way you'll need to run Git garbage collection to complete the process. 

Manually reviewing large files in your repository

Antony Stubbs has written a BASH script that does this very well. The script examines the contents of your packfile and lists out the large files.  Before you begin removing files, do the following to obtain and install this script:

  1. Download the script to your local system.
  2. Put it in a well known location accessible to your Git repository.
  3. Make the script an  executable:

    $ chmod 777 git_find_big.sh
  4. Clone the repository to your local system.
  5. Change directory to your repository root.
  6. Run the Git garbage collector manually.

    git gc --auto
  7. Find out the size of the .git folder.

    $ du -hs .git/objects
    45M	.git/objects 

    Note this size down for later reference.

  8. List the big files in your repo by running the git_find_big.sh script.

    $ git_find_big.sh 
    All sizes are in kB's. The pack column is the size of the object, compressed, inside the pack file.
    size  pack  SHA                                       location
    592   580   e3117f48bc305dd1f5ae0df3419a0ce2d9617336  media/img/emojis.jar
    550   169   b594a7f59ba7ba9daebb20447a87ea4357874f43  media/js/aui/aui-dependencies.jar
    518   514   22f7f9a84905aaec019dae9ea1279a9450277130  media/images/screenshots/issue-tracker-wiki.jar
    337   92    1fd8ac97c9fecf74ba6246eacef8288e89b4bff5  media/js/lib/bundle.js
    240   239   e0c26d9959bd583e5ef32b6206fc8abe5fea8624  media/img/featuretour/heroshot.png

    The big files are all JAR files.  The pack size column is the most relevant.  The aui-dependencies.jar compacts to 169KB  but the emojis.jar compacts only to 580.  The emojis.jar is a candidate for removal.

Running filter-branch

The filter-branch command can contain task specific filters for rewriting the Git index.  For example, a filter can remove a file from every indexed commit.  The syntax for this is the following:

git filter-branch --index-filter 'git rm --cached --ignore-unmatch pathname' commitHASH
  • The --index-filter option modifies a repo's staging (or index).
  • The --cached option removes a file from the index not the disk.  This is faster as you don't have to checkout each revision before running the filter. 
  • The --ignore-unmatch option in git rm prevents the command from failing if the pathname it is trying to remove isn't there. 
  • By specifying a commit HASH, you remove the pathname from every commit starting with the HASH on up.  To remove from the start, leave this off or you can specify HEAD.  

If all your large files are in different branches, you'll need to delete each file by name. If all the files are within a single branch,  you can delete the branch itself.

Delete files by name

Use the following procedure to remove large files:

  1. Run the following command (entering the actual file name in place of the filename designation) to remove the first large file you identified:

    git filter-branch --index-filter 'git rm --cached --ignore-unmatch filename' HEAD
  2. Repeat Step 1 for each remaining large file.

  3. Update the references in your repository. filter-branch creates backups of your original refs namespaced under refs/original/. Once you're confident that you deleted the correct files, you can run the following command to delete the backed up refs, allowing the large objects to be garbage collected:

    $ git for-each-ref --format="%(refname)" refs/original/ | xargs -n 1 git update-ref -d

Delete just the branch

If all your large files are on a single branch,  you can just delete the branch. Deleting the branch automatically removes all the references.

  1. Delete the branch.

    $ git branch -D branch-name
  2. Prune all of the reflog references from the branch on back.

    $ git reflog expire --expire=now branch-name

Garbage collecting dead data

  1. Prune all of the reflog references from now on back (unless you're explicitly only operating on one branch).

    $ git reflog expire --expire=now --all
  2. Repack the repository by running the garbage collector and pruning old objects.

    $ git gc --prune=now
  3. Push all your changes back to the Bitbucket repository.

    $ git push --all --force
  4. Make sure all your tags are current too:

    $ git push --tags --force

How to reduce a Mercurial repository 

You should do regular maintenance of your Mercurial repository to reduce its size.  If you imported code from another version control system, you may need to clean up unnecessary files after the import.  This explains how to reduce repository size by removing large files from a Mercurial repository.

Another technique to reduce repository size is to split the repository into multiple smaller ones. This could be run for each directory in the repository, which will create sub-repositories for each one. To learn more about splitting repositories, see Split a repository in two.

How to find large files

Large files are typically things like third-party libraries (jars, dlls), compiled versions of your applications,  and binary media assets (such as image files).  Keep in mind that Mercurial usually saves differences between files. A small change to a binary file's content can cause many or most of the file's bytes to change.  Committing a change to binary files potentially causes Mercurial to store the entire or most of a large file multiple times.  

In a Linux environment

To find large files in a Linux environment, use the following piped command:

$ find . -type f \( ! -regex ".*/\..*" \) -print | xargs ls -l | sort -k5,5rn | head

This command ignores hidden files and directories. For example, the command ignores everything in the the .hg directory.  It sorts the output of the ls by size and uses the head command to return the ten largest files. For example, right now the Bitbucket tutorial repo has these large files:

-rwxr-xr-x  1 manthony  staff  548107 Feb 12 11:18 ./yearone.html
-rw-r--r--  1 manthony  staff  205672 Feb 12 11:18 ./images/mahmoud-darwish.gif
-rw-r--r--  1 manthony  staff  155848 Feb 12 11:18 ./images/so_many_activities.jpg
-rw-r--r--  1 manthony  staff  149472 Feb 12 11:18 ./images/EleanorRoosevelt.png
-rw-r--r--  1 manthony  staff  122251 Feb 12 11:18 ./images/AmbroseBierce.gif
-rw-r--r--  1 manthony  staff  112894 Feb 12 11:18 ./javascripts/foundation.js
-rw-r--r--  1 manthony  staff  109986 Feb 12 11:18 ./images/Deep-Thought.png
-rw-r--r--  1 manthony  staff   88873 Feb 12 11:18 ./images/AlbertEinstein.png
-rw-r--r--  1 manthony  staff   88387 Feb 12 11:18 ./images/willferrell.png
-rw-r--r--  1 manthony  staff   87721 Feb 12 11:18 ./images/NeilTysonOriginsA-FullSize.jpg

In a Windows environment

In a Windows environment, we recommend using PowerShell.  To open PowerShell in Windows 7, do the following:

  1. Click the Start button.
  2. Begin typing Powershell in the Search programs and files field.
  3. Select the Windows PowerShell option.
    The Powershell command window opens. 
  4. Change to the root of your repository.
  5. Enter the following at the command prompt:

    gi -Path .\* -Exclude .hg | gci -r -ea 0 | sort Length -desc | select -f 10

    This command lists all the repo's files excluding those in the .hg (metadata) directory. The system lists output similar to the following:

        Directory: C:\Users\manthony\Documents\tutorials
    
    Mode                LastWriteTime     Length Name
    ----                -------------     ------ ----
    -a---         3/25/2013  10:00 AM     548107 yearone.html
    
        Directory: C:\Users\manthony\Documents\tutorials\images
    
    Mode                LastWriteTime     Length Name
    ----                -------------     ------ ----
    -a---         3/25/2013  10:00 AM     205672 mahmoud-darwish.gif
    -a---         3/25/2013  10:00 AM     155848 so_many_activities.jpg
    -a---         3/25/2013  10:00 AM     149472 EleanorRoosevelt.png
    -a---         3/25/2013  10:00 AM     122251 AmbroseBierce.gif
    
        Directory: C:\Users\manthony\Documents\tutorials\javascripts
    
    Mode                LastWriteTime     Length Name
    ----                -------------     ------ ----
    -a---         3/25/2013  10:00 AM     112894 foundation.js
    
        Directory: C:\Users\manthony\Documents\tutorials\images
    
    Mode                LastWriteTime     Length Name
    ----                -------------     ------ ----
    -a---         3/25/2013  10:00 AM     109986 Deep-Thought.png
    
        Directory: C:\Users\manthony\Documents\tutorials
    
    Mode                LastWriteTime     Length Name
    ----                -------------     ------ ----
    -a---         3/25/2013  10:00 AM      91116 index.html
    
        Directory: C:\Users\manthony\Documents\tutorials\images
    
    Mode                LastWriteTime     Length Name
    ----                -------------     ------ ----
    -a---         3/25/2013  10:00 AM      88873 AlbertEinstein.png
    -a---         3/25/2013  10:00 AM      88387 willferrell.png

Removing large files

To remove large files, you use the convert extension and --filemap option.  The convert extension filters a repository and creates a new repository with a parallel history. The  --filemap option takes a filemap file that specifies filters for file processing.   During the conversion process the convert extension uses the filemap to modify the changesets it processes. You can use the filemap to include, rename, or exclude individual files or whole directories.

For example, the following shows a simple set of filemap directives:

# Comment
include path/to/file
exclude path/to/file
rename from/file to/file

For the detailed information about the convert extension and the --filemap option, see the ConvertExtension --filemap documentation.

Example of using convert to reduce repository files

Consider the following directory structure in a repository:

repo
│─ doc
│─ commons-collections-3.2.1-javadoc.jar
│─ commons-io-2.0.1-javadoc.jar
└─ commons-lang-2.6-javadoc.jar
│─ lib
│─ commons-collections-3.2.1.jar
│─ commons-io-2.0.1.jar
└─ commons-lang-2.6.jar
└─ src
│─ commons-collections-3.2.1-sources.jar
│─ commons-io-2.0.1-sources.jar
└─ commons-lang-2.6-sources.jar

To remove all libraries except commons-lang and retain all Javadoc except the one for commons-io you create the following filemap.txt file in the repository root:

include "repo"
exclude "repo/lib"
include "repo/lib/commons-lang-2.6.jar"

# the following include is optional
include "repo/doc"
exclude "repo/doc/commons-io-2.0.1-javadoc.jar"

Then, to convert the structure, you would issue the following command at the command line:

hg convert --filemap filemap.txt initialHgRepo hgRepoAfterConversion

initialHgRepo is the repository to convert and the hgRepoAfterConversion is the new repository.  After the conversion, hgRepoAfterConversion repository structure is:

repo
│─ doc
│─ commons-collections-3.2.1-javadoc.jar
└─ commons-lang-2.6-javadoc.jar
│─ lib
└─ commons-lang-2.6.jar
└─ src
│─ commons-collections-3.2.1-sources.jar
│─ commons-io-2.0.1-sources.jar
└─ commons-lang-2.6-sources.jar

You can then check in the hgRepoAfterConversion repository.

Maintain repository size by deleting files

Deleting unused files is a good way to reduce repository size. Remember deleting a file does not remove it from history but does reduce the current version of the repository.  Every clone still contains the old file if you need to retrieve it.  When looking at files to delete, consider the following:

  • SQL dumps
  • Large media assets
  • Compiled versions of your applications
  • Third party libraries and dependencies (jars, dlls, gem, etc.)
  • Completely unneeded large files in history

Binary files are always candidates.  DVCS systems are not good candidates for storing binary files.  Consider hosting these on file hosting service such as Google Drive, Dropbox, or Carbonite.

Split a repository into project repositories

You can reduce your repository size by splitting a repository by code projects.  This requires that you understand how your projects  references each other. For example, if you have four projects that don't reference each other in one large repository, you split these up into four smaller repositories.  

For a procedure on splitting a repository, see Split a repository in two.


Was this helpful?

Thanks for your feedback!

Why was this unhelpful?

Have a question about this article?

See questions about this article

Powered by Confluence and Scroll Viewport