Pipelined indexing

Prior to Fisheye 3.0, indexing of a repository in Fisheye was performed as a serial, monolithic process from the first commit to the latest. The result was that sometimes it could take a long time before the latest (newest) changesets became visible in FishEye. Since the newest changesets are often the most interesting, FishEye could seem to be less than useful until indexing had fully completed.

In FishEye 3.0 we've introduced a new pipelined indexing approach that splits the indexing process into separate tasks that can be performed in a phased and concurrent way. This approach allows FishEye to provide core functionality, such as review creation, file browsing, the activity stream and JIRA integration, for all the changesets in the repository, far sooner than in previous versions. You can get on with your work, while FishEye quietly completes the fine details of indexing in the background.

For now, pipelined indexing is only available for Subversion repositories. Other SCMs will be supported in subsequent releases.

Indexing phases

Pipelined indexing splits the indexing process for a repository into two phases. Tasks within those phases are performed in parallel. When a phase completes, the FishEye functionality that depends on that information then becomes available. The phases, and the functionality associated with them, is shown in the table below:

Phase

What you can do and see

Scanning

Create reviews

Browse files, revisions and diffs

See changeset ancestry

Use JIRA integration features

View the Activity stream

2. Indexing

Search filenames, content, and diffs

See line count data

See file revision ancestry and predecessors

Use eyeQL queries

Monitoring progress

When you add a repository to FishEye, and enable it, pipelined indexing begins immediately. An administrator can monitor indexing progress in the 'Repository status' section of the 'Repository' page:

Click Show more to see more details for the indexing process.

You can Stop, and Restart, indexing if necessary.

When browsing a repository that is currently being indexed, you'll see a progress indicator at the top of the page:

Advanced: configuring pipeline indexing

At start up, FishEye 3.0 auto-configures the pipeline based on the detected environment. This default configuration should suit most users. You may choose to tune the pipeline to suit your environment. The following System Properties control aspects of the pipeline operation. To set any of these properties, please see Setting JVM System Properties. In changing these properties, please be aware that there is a single instance of the pipeline, shared between all repositories.

Caution

Changing these values can adversely impact the performance of your FishEye instance. Please proceed with caution.

System Property	Legal values	Default Value	Description	FishEye version
`fisheye.pipeline.threads`	`4 - 1000`	The number of processor cores detected, or 4, whichever is greater	The number of threads available to work on indexing tasks. A higher value allows for potentially greater concurrency in indexing activities, but will increase the load on the server and the load on your SCM.	3.0.0
`fisheye.pipeline.fairness`	`true, false`	`true`	If `true`, repositories are processed in a round-robin fashion by the pipeline. This means that all repositories currently indexing will incrementally progress. If `false`, the pipeline always favours newer changesets, regardless of which repository they belong to. This means that repositories with newer changesets will take priority over repositories with older changesets. This results in faster processing for newer repositories, but with the possibility of stalling indexing of repositories with older commits.	3.0.0
`fisheye.pipeline.batch.cslimit`	`1 - 50000`	`10000`	The pipeline processes changesets in batches. A bigger batch size reduces the number of calls to the underlying SCM, but also increases heap consumption and reduces the concurrency of the pipeline. Smaller batch sizes increase the concurrency of the pipeline, reduce heap consumption, but result in more calls to the underlying SCM.	3.0.0
`fisheye.pipeline.batch.pathlimit`	`1000 - 120000`	`60000`	The pipeline processes changesets in batches. This property sets the maximum allowed number of unique paths in a batch (summed across changesets). A higher value will reduce the number of calls to the underlying SCM, at the expense of more heap consumption.	3.0.0

Child pages

Pipelined indexing

Indexing phases

Monitoring progress

Advanced: configuring pipeline indexing