Git Repository Indexing is Too Slow when Creating a New Branch or Tag

Still need help?

The Atlassian Community is here for you.

Ask the community

The time to index a Git repository has significantly decreased since Fisheye/Crucible 3.4.0 when improvements were made to how the Git manifest is index. More information can be found here.

Problem

Indexing a new branch/tag in GIT is too slow.

Diagnosis

Environment

  • Fisheye/Crucible version prior to 3.4.0

Cause

When Fisheye finds a new branch, it needs to build a manifest of the branch. I.e. for each file in the branch, what is the latest commit that affects that file. This information is used for various Fisheye operations such as the commit graph, EyeQL queries, and the branch activity display pages.

The way Fisheye generates the branch manifest is by asking Git for the current manifest using the git ls-tree command. Unfortunately Git provides the tree in terms of the file's content hash and not the commit hash. Where the content hash of a given path is unique to a particular commit, Fisheye is able to quickly map the content hash to the commit hash and build the manifest. If, however, multiple commits have the same content for the same file path, Fisheye must determine which commit is the appropriate one to record in the manifest. This takes a relatively slow search of the file path's history (the commit ancestry).

Normally the content of a file path across multiple changes is mostly unique to each commit that affects the file and the search is not needed.

Resolution

Fisheye provides the flag --Xenable-git-content-hash-resolving-heuristic which changes the behavior when there are multiple commits mapped to the same content hash on the same path. In this case Fisheye picks the most recent match. This should be the correct choice for the workflow in the majority of cases.

Restart Fisheye with this flag specified and monitor the performance when new branches are pushed. This should have a significant impact on processing times. A possible side effect is that it will lead to incorrect parents, meaning, the list of parents could be wrong for any file revision, the wrong revision will be tagged (in case of a tag) and the last modified date and revision in the dir tree will be incorrect.

Last modified on Nov 2, 2018

Was this helpful?

Yes
No
Provide feedback about this article
Powered by Confluence and Scroll Viewport.