Monitoring your mirror farm
There are a number of helpful tools and techniques you can use to monitor the health of your mirror farm.
Performance monitoring using JMX metrics
Java Management eXtensions (JMX) is a technology used for monitoring and managing Java apps. JMX can be used to determine the overall health of each mirror node and the mirror farm. The following statistics are most important to monitor:
Hosting tickets on mirror nodes
Mirror hosting tickets on the primary
Incremental sync time on mirror nodes
Snapshot sync time on mirror nodes
Disk space, CPU, and memory
Mirror farm JMX metrics
Learn more about mirror farm JMX counters and what they monitor:
For more information and a complete list of JMX metrics, check Enabling JMX counters for performance monitoring.
Timers
The values retained in timer metrics exhibit decaying behavior, with more recent values favored over older values. Unless stated otherwise, the attributes provided by these metrics represent a snapshot in time, reflecting the duration of an operation.
Name | Description |
---|---|
| The mean operation duration |
| The standard duration of a deviation operation |
| Returns the duration at the 50th percentile in the distribution |
| Returns the duration at the 75th percentile in the distribution |
| Returns the duration at the 95th percentile in the distribution |
| Returns the duration at the 98th percentile in the distribution |
| Returns the duration at the 99th percentile in the distribution |
| Returns the duration at the 999th percentile in the distribution |
| The maximum duration of the operation |
| The minimum duration of the operation |
| Milliseconds |
| The number of events per second |
| Returns the one-minute exponentially-weighted moving average rate at which this operation has been called |
| Returns the mean rate at which events have occurred |
| Returns the 15-minute exponentially-weighted moving average rate at which this operation has been called |
| Returns the five-minute exponentially-weighted moving average rate at which this operation has been called |
| The number of times this operation has been called since application startup |
The timer metrics are available for the following operations:
Time to synchronize the repository across the mirror farm with the upstream
Incremental sync
com.atlassian.bitbucket:type=metrics,category00=mirror,category01=farm,category02=synchronize,category03=content,name=incremental
Snapshot sync
com.atlassian.bitbucket:type=metrics,category00=mirror,category01=farm,category02=synchronize,category03=content,name=snapshot
Time taken to distribute ref changes to the mirror farm nodes and fetching objects from the upstream while syncing a repository
Incremental sync
com.atlassian.bitbucket:type=metrics,category00=mirror,category01=farm,category02=distribute-fetch,name=incremental
Snapshot sync
com.atlassian.bitbucket:type=metrics,category00=mirror,category01=farm,category02=distribute-fetch,name=snapshot
Total time taken to detect and fix all inconsistent repositories on the mirror farm during a single farm vet run
com.atlassian.bitbucket:type=metrics,category00=mirror,category01=farm,category02=vet,name=timer
Time taken to synchronize project and/or repository metadata from the upstream on the mirror farm
For syncing all projects:
com.atlassian.bitbucket:type=metrics,category00=mirror,category01=farm,category02=synchronize,category03=metadata,name=all-projects
For syncing a single project:
com.atlassian.bitbucket:type=metrics,category00=mirror,category01=farm,category02=synchronize,category03=metadata,name=single-project
For syncing a single repository:
com.atlassian.bitbucket:type=metrics,category00=mirror,category01=farm,category02=synchronize,category03=metadata,name=repository
Time taken by a repository sync request from creation till it is successfully processed on the mirror farm:
com.atlassian.bitbucket:type=metrics,category00=mirror,category01=request,category02=synchronize,name=cycle-time
Time taken by mirror operations in different circumstances. A mirror operation is defined as the unit of work done on each mirror node as part of a process initiated by one of the mirror nodes in a farm. There are different types of mirror operations which are explained in the section below. These are the metrics collected for each operation type:
For successful operation:
com.atlassian.bitbucket:mirror.operation.local.<operation_name>.success
For failed operation:
com.atlassian.bitbucket:mirror.operation.local.<operation_name>.error
Total time taken by all the nodes to perform the mirror operations and respond to the node that initiated the process. These are the metrics collected for each operation type in different circumstances:
When all the nodes perform the mirror operation successfully and respond with the same result:
com.atlassian.bitbucket:mirror.operation.distributed.<operation_name>.success
When one or more nodes failed to perform the mirror operation:
com.atlassian.bitbucket:mirror.operation.distributed.<operation_name>.error
When one or more nodes failed to response within the configured timeout for that operation:
com.atlassian.bitbucket:mirror.operation.distributed.<operation_name>.timeout
When all the nodes respond without error but return conflicting results for a given operation:
com.atlassian.bitbucket:mirror.operation.distributed.<operation_name>.timeout
Synchronization and consistency
A repo-hash endpoint is provided on both the mirror farm and the primary server. It’s used to check the consistency of a mirror farm and nodes with respect to the primary. This is the same endpoint that Mirror farm vet uses to repair any inconsistencies that come up, such as the result of a missing a webhook. There are some important considerations to keep in mind when using this endpoint:
The endpoint,
rest/mirroring/latest/repo-hashes
, is available on both the primary and the mirror nodes. It returns a stream of JSON containing acontent
andmetadata
hash for each repository. Thecontent
hash is a digest of the Git repository itself, while themetadata
hash is a digest of the metadata that Bitbucket holds concerning the repository, such as the repository name.Content hashes or just metadata hashes are individually requested by calling
rest/mirroring/latest/repo-hashes/content
orrest/mirroring/latest/repo-hashes/metadata
.
This is what the payload looks like:
{ "projects": [ { "id": 1, "public": false, "repositories": [ { "id": 1, "hashes": { "content": "082a2ffa1520447bb6c0072f9f9d850c76f111c0ff9a08cca8838b12b0ccc31a", "metadata": "b8fae6cb4704174f8dafae601355279950f921ba55b7620f4bdaa1280e735d14" } }, { "id": 2, "hashes": { "content": "0000000000000000000000000000000000000000", "metadata": "e80aeaf459a69e7000b9e785eb39640a5d929f7ec4f09512a9ab6fabf4a0c80a" } } ] } ] }
The process to generate content hashes while reasonably fast needs to run against every repository on the instance, for larger instances this could take quite some time so we make an optimisation. When a upstream is first upgraded to a mirror farm capable version a “empty” content hash is generated for each repository this appears as
0000000000000000000000000000000000000000
as can be seen in thecontent
attribute of the second repository above. When the farm vet encounters a repository with a content hash of0000000000000000000000000000000000000000
it considers that repository up to date.A mirror will only return entries for the project or repository it’s mirroring. While the content returned from a mirror and the primary will be the same, the order of entries could be different. One way to sort the order consistently for diffing is to use the
JQ
queryjq '.projects | sort | .[].repositories |= sort_by(.id)'
Webhook
The mirror synchronized webhook can be used to trigger builds as soon as the mirror has finished synchronizing. It’s also useful for monitoring the repository in your mirror farm. Details of this repository event can be found in the Event payload page.
Monitoring the status of your mirrors
You can configure your load balancer to check the node’s status using the /status
endpoint. A response code of 200
is returned if the mirror node is in a SYNCHRONIZED
state. If there are no nodes in the SYNCHRONIZED
state, a 200
response code will be returned for any mirror that is in one of the following states:
BOOTSTRAPPED
BOOTSTRAPPING
METADATA_SYNCHRONIZED
For customers who want a “strict” status endpoint we provide a plugin.mirroring.strict.hosting.status configuration property that when set to true, the /status endpoint returns a 200 response code only if the mirror is in the SYNCHRONIZED state. The setup for this configuration is outside the scope of the document. It is important to note that at least one mirror node should be accessible from the upstream server.
The table below displays each state and it’s description:
State | Description |
---|---|
STARTING | A Bitbucket application is starting. |
STOPPING | A Bitbucket application is stopping. |
BOOTSTRAPPING | The mirror component is started. |
BOOTSTRAPPED | The mirror has joined the cluster. If this is the first time the mirror farm has been connected to a primary, this is the state the application will wait in until it has been authorized. |
METADATA_SYNCHRONIZED | Project or repository metadata has been synchronized from the primary and Git repositories have started synchronization. |
SYNCHRONIZED | The mirror farm has synchronized all Git repositories from the primary. If new projects or repositories are added to the mirror farm this state will not change. It indicates that the initial set of projects or repositories that where configured at startup time have been synchronized. |
ERROR | There was an error starting the application node. |
When performing a GET
operation against the /status
endpoint, the returned data is made up of JSON with two properties, status
and nodeCoun
t.
For example; {"state":"SYNCHRONIZED","nodeCount":"4"}