Maintaining a Mercurial Repository

You should do regular maintenance of your Mercurial repository to reduce its size.  If you imported code from another version control system, you may need to clean up unnecessary files after the import.  This explains how to reduce repository size by removing large files from a Mercurial repo and contains the following topics:

Another technique to reduce repository size is to split the repository into multiple smaller ones. This could be run for each directory in the repository, which will create sub-repositories for each one. To learn more about splitting repositories, see Split a repository in two.

How to Find Large Files

Large files are typically things like third-party libraries (jars, dlls), compiled versions of your applications,  and binary media assets (such as image files).  Keep in mind that Mercurial usually saves differences between files. A small change to a binary file's content can cause many or most of the file's bytes to change.  Committing a change to binary files potentially causes Mercurial to store the entire or most of a large file multiple times.  

In a Linux Environment

To find large files in a Linux environment, use the following piped command:

$ find . -type f \( ! -regex ".*/\..*" \) -print | xargs ls -l | sort -k5,5rn | head

This command ignores hidden files and directories. For example, the command ignores everything in the the .hg directory.  It sorts the output of the ls by size and uses the head command to return the ten largest files. For example, right now the Bitbucket tutorial repo has these large files:

-rwxr-xr-x  1 manthony  staff  548107 Feb 12 11:18 ./yearone.html
-rw-r--r--  1 manthony  staff  205672 Feb 12 11:18 ./images/mahmoud-darwish.gif
-rw-r--r--  1 manthony  staff  155848 Feb 12 11:18 ./images/so_many_activities.jpg
-rw-r--r--  1 manthony  staff  149472 Feb 12 11:18 ./images/EleanorRoosevelt.png
-rw-r--r--  1 manthony  staff  122251 Feb 12 11:18 ./images/AmbroseBierce.gif
-rw-r--r--  1 manthony  staff  112894 Feb 12 11:18 ./javascripts/foundation.js
-rw-r--r--  1 manthony  staff  109986 Feb 12 11:18 ./images/Deep-Thought.png
-rw-r--r--  1 manthony  staff   88873 Feb 12 11:18 ./images/AlbertEinstein.png
-rw-r--r--  1 manthony  staff   88387 Feb 12 11:18 ./images/willferrell.png
-rw-r--r--  1 manthony  staff   87721 Feb 12 11:18 ./images/NeilTysonOriginsA-FullSize.jpg

In a Windows Environment

In a Windows environment, we recommend using PowerShell.  To open PowerShell in Windows 7, do the following:

  1. Click the Start button.
  2. Begin typing Powershell in the Search programs and files field.
  3. Select the Windows PowerShell option.
    The Powershell command window opens. 
  4. Change to the root of your repository.
  5. Enter the following at the command prompt:

    gi -Path .\* -Exclude .hg | gci -r -ea 0 | sort Length -desc | select -f 10

    This command lists all the repo's files excluding those in the .hg (metadata) directory. The system lists output similar to the following:

        Directory: C:\Users\manthony\Documents\tutorials
    
    Mode                LastWriteTime     Length Name
    ----                -------------     ------ ----
    -a---         3/25/2013  10:00 AM     548107 yearone.html
    
        Directory: C:\Users\manthony\Documents\tutorials\images
    
    Mode                LastWriteTime     Length Name
    ----                -------------     ------ ----
    -a---         3/25/2013  10:00 AM     205672 mahmoud-darwish.gif
    -a---         3/25/2013  10:00 AM     155848 so_many_activities.jpg
    -a---         3/25/2013  10:00 AM     149472 EleanorRoosevelt.png
    -a---         3/25/2013  10:00 AM     122251 AmbroseBierce.gif
    
        Directory: C:\Users\manthony\Documents\tutorials\javascripts
    
    Mode                LastWriteTime     Length Name
    ----                -------------     ------ ----
    -a---         3/25/2013  10:00 AM     112894 foundation.js
    
        Directory: C:\Users\manthony\Documents\tutorials\images
    
    Mode                LastWriteTime     Length Name
    ----                -------------     ------ ----
    -a---         3/25/2013  10:00 AM     109986 Deep-Thought.png
    
        Directory: C:\Users\manthony\Documents\tutorials
    
    Mode                LastWriteTime     Length Name
    ----                -------------     ------ ----
    -a---         3/25/2013  10:00 AM      91116 index.html
    
        Directory: C:\Users\manthony\Documents\tutorials\images
    
    Mode                LastWriteTime     Length Name
    ----                -------------     ------ ----
    -a---         3/25/2013  10:00 AM      88873 AlbertEinstein.png
    -a---         3/25/2013  10:00 AM      88387 willferrell.png

Removing Large Files

To remove large files, you use the convert extension and --filemap option.  The convert extension filters a repository and creates a new repository with a parallel history. The  --filemap option takes a filemap file that specifies filters for file processing.   During the conversion process the convert extension uses the filemap to modify the changesets it processes. You can use the filemap to include, rename, or exclude individual files or whole directories.

For example, the following shows the a simple set of filemap directives:

# Comment
include path/to/file
exclude path/to/file
rename from/file to/file

For the detailed information about the convert extension and the --filemap option, see the ConvertExtension --filemap documentation.

Example of using convert to reduce repository files

Consider the following directory structure in a repository:

repo
│─ doc
│─ commons-collections-3.2.1-javadoc.jar
│─ commons-io-2.0.1-javadoc.jar
└─ commons-lang-2.6-javadoc.jar
│─ lib
│─ commons-collections-3.2.1.jar
│─ commons-io-2.0.1.jar
└─ commons-lang-2.6.jar
└─ src
│─ commons-collections-3.2.1-sources.jar
│─ commons-io-2.0.1-sources.jar
└─ commons-lang-2.6-sources.jar

To remove all libraries except commons-lang and retain all Javadoc except the one for commons-io you create the following filemap.txt file in the repository root:

include "repo"
exclude "repo/lib"
include "repo/lib/commons-lang-2.6.jar"

# the following include is optional
include "repo/doc"
exclude "repo/doc/commons-io-2.0.1-javadoc.jar"

Then, to convert the structure, you would issue the following command at the command line:

hg convert --filemap filemap.txt initialHgRepo hgRepoAfterConversion

initialHgRepo is the repository to convert and the hgRepoAfterConversion is the new repository.  After the conversion, hgRepoAfterConversion repository structure is:

repo
│─ doc
│─ commons-collections-3.2.1-javadoc.jar
└─ commons-lang-2.6-javadoc.jar
│─ lib
└─ commons-lang-2.6.jar
└─ src
│─ commons-collections-3.2.1-sources.jar
│─ commons-io-2.0.1-sources.jar
└─ commons-lang-2.6-sources.jar

You can then check in the hgRepoAfterConversion repository.

Was this helpful?

Thanks for your feedback!

Why was this unhelpful?

Have a question about this article?

See questions about this article

Powered by Confluence and Scroll Viewport