Informational: About Bitbucket Smart Mirrors

Still need help?

The Atlassian Community is here for you.

Ask the community

For Atlassian eyes only

This article is Work In Progress and cannot be shared with customers.

Platform Notice: Data Center - This article applies to Atlassian products on the Data Center platform.

Note that this knowledge base article was created for the Data Center version of the product. Data Center knowledge base articles for non-Data Center-specific features may also work for Server versions of the product, however they have not been tested. Support for Server* products ended on February 15th 2024. If you are running a Server product, you can visit the Atlassian Server end of support announcement to review your migration options.

*Except Fisheye and Crucible

Summary

This page lists some important facts and FAQs about how Bitbucket Smart Mirrors work.

Environment

Bitbucket Smart Mirrors

Solution

FAQs

How often does the sync happen between the Mirror and the Primary server? Is it configurable?

  • When data is pushed to the primary server, Bitbucket does indeed notify the mirrors that there are modifications. This is done to ensure that all mirrors in the mirror farm maintain the most current data. The mirrors then initiate a synchronization process to update their data based on the changes made on the primary server.

  • If the primary server is failing to listen to events and the Mirror is not updating itself, for that case we have a mechanism implemented so the Mirrors will try to fetch the primary every specific interval (For Bitbucket version 8.19 the default value of plugin.mirroring.synchronization.interval (which controls how frequently a full synchronization with the upstream server should run) is 3 minutes.
  • All the properties related to mirroring can be found at our Bitbucket Properties - Mirroring.
  • The number of syncs per minute is not explicitly limited, but it's important to note that the performance can be affected by the size of your repositories, the number of changes, and the network conditions between your main Bitbucket instance and the mirrors.

How can we debug from the Primary Server logs the request it will send to update the Mirror?

This public-facing page How we built Bitbucket Smart Mirror Farms - Atlassian Engineering explains everything you want to know about how the Mirror ref change works in great detail. While it doesn't cover Git LFS, it is close to perfect in explaining exactly how the Mirror/Mirror Farm sync operates internally. The following example provides a detailed explanation of how the information in the document translates into log entries.

How many sync processes can be running simultaneously?

Repositories are synchronized in parallel, utilizing 5 threads by default. Consequently, only 5 repos can sync simultaneously. Any additional repos are queued on the Mirror and picked up by the threads once they finish processing the repositories that are already in progress.

When a ref-change occurs on the primary, a ref-change Webhook event is triggered from the Primary to the Mirror nodes. The webhook received on the Mirror end is queued, picked up, and processed by the Mirror. This ref-change sync process is limited to 5 threads by default, allowing a maximum of 5 changes to be synced from the primary to the Mirror at any given time. The reason for this limitation is to prevent the Mirror from actively polling the Primary, which could exhaust the primary resources.
You can change the default value, but the impact is that doing so would increase the CPU usage on Mirrors and the SCM load on the upstream nodes. Modifying this would result in a higher number of concurrent operations on Mirror. It would also poll the Primary more frequently, thereby increasing the load on the Primary as well.

 Therefore, we strongly recommend sticking to the default configuration. In case you encounter any issues, after thorough analysis, we can assist in determining where adjustments might be needed to handle the load. By default, we wouldn't recommend customers tweak any of these values.

How can we monitor if any delays in mirror sync from primary?

The monitoring of the Mirror or the Mirror farm Nodes can be done as mentioned in the official documentation here: Monitoring your mirror farm. Essentially, this involves monitoring the content hash of the primary repository and the Mirror to ensure they are in sync. However, this method does not provide the last sync details, which are available through the latest REST API's on version 8.6 using repoSyncStatus.

The time taken by the Incremental and Snapshot sync, as well as the number of Incremental and Snapshot syncs received, can all be monitored using JMX monitoring available for the Mirrors. Currently, this information is not publicly documented. We're collaborating with content designers and developers to make this information publicly available.

UI monitoring

One way to monitor the mirror sync status is via the Bitbucket UI.

  • On the Bitbucket Upstream UI → Administration  Mirrors
  • This shows the sync status of the Mirror nodes, not individual repositories.

The second way to check via the Bitbucket UI is in a particular repository.

  • On the Bitbucket Upstream UI → Repositories → clicking on the Clone button on the left menu.
  • This shows the sync status of the individual repositories

REST API

Another way is to check via the available REST APIs.

curl -u $username:$password <mirror url>/status

Output would look like this:

{"state":"BOOTSTRAPPED", "memberCount": "1", "discovering": false, "syncedRepos": 0, "totalRepos": 0}

This shows the status of the nodes, not individual repositories.

To check if anything is syncing at all:

curl -u  <admin-username>:<admin-password> --request GET \
  --url 'http://<MIRROR-URL>/rest/mirroring/latest/upstreamServers/{upstreamId}/progress' \
  --header 'Accept: application/json'

replace <admin-username>, <admin-password>, <MIRROR-URL> and <Upstream-ID> with your values.

This will return some output like:

{"discovering":false,"syncedRepos":242,"totalRepos":1071}

This will let you know if anything is syncing at this moment. There are 3 values:

  • discovering - This is the status of the upstream repository count. If it is false, totalRepos is still counting. If it is true, totalRepos has finished counting
  • syncedRepos - This is the amount of repos synced
  • totalRepos - This is the amount of repos that need to be synced.

If there is no sync happening, the output will be:

{"discovering":false,"syncedRepos":0,"totalRepos":0}

To get the status and last sync time of a repository:

curl -u <admin-username>:<admin-password> --request GET \
  --url '<MIRROR-URL>/rest/mirroring/latest/upstreamServers/{upstreamId}/repos/{upstreamRepoId}' \
  --header 'Accept: application/json'

replace <admin-username>, <admin-password>, <MIRROR-URL>, <upstreamId> and <upstreamRepoId> with your values. The last 2 can be retrieved from the Mirror support zip which has the application.xml file. The {upstreamRepoId} is the Repository ID on the Primary Server. The output of the above will look like this:

{
  "available": false,
  "links": {
    "clone": [
      {
        "href": "ssh://git@<Mirror-URL>:7999/fir/new-repo.git",
        "name": "ssh"
      },
      {
        "href": "<Mirror-URL>/scm/fir/new-repo.git",
        "name": "http"
      }
    ],
    "push": [
      {
        "href": "ssh://git@<Mirror-URL>:7999/fir/new-repo.git",
        "name": "ssh"
      },
      {
        "href": "<Mirror-URL>/scm/fir/new-repo.git",
        "name": "http"
      }
    ]
  },
  "mirrorName": "Bitbucket Mirror Farm",
  "repositoryId": "1",
  "status": "INITIALIZING"
}

The same repo on the Mirror and the Upstream will have different repo IDs. The Repository ID on the Upstream would be referred to as "externalRepositoryId" on the Mirror Logs. To find the corresponding externalRepositoryId of a Mirror repo, check the application.xml in the Mirror support zip for details like those shown below, if the same needs to be pulled from the Mirror Database look for the info on the table AO_8E6075_REPO_MAPPING DB table.

    <repository>
      <id>1</id>
      <slug>mono</slug>
      <name>mono</name>
      <type>git</type>
      <approximate-size>Unknown</approximate-size>
      <state>AVAILABLE</state>
      <status-message>Available</status-message>
      <marked-public>true</marked-public>
      <is-public>true</is-public>
      <is-fork>false</is-fork>
      <is-remote>false</is-remote>
      <partition>-1</partition>
      <available>true</available>
      <mirrored-repository>
        <mirroring-status>AVAILABLE</mirroring-status>
        <remote-id>1619</remote-id>

Notice the "<remote-id>1619</remote-id>", 1619 is the repository ID in the Primary node.

This will output something similar to the following:

  • Status - can be:
    • INITIALIZING - The repository is still copying for the first time
    • ERROR_INITIALIZING - There was an error with the initial clone
    • NOT_MIRRORED - The repository is not mirrored
    • AVAILABLE - The repository has synced successfully without errors.
    • ERROR_AVAILABLE - There is an error while syncing.
  • updatedDate: The time the repository last synced successfully.

Test Network connectivity between Primay and Mirror node

To check for any network any connectivity issues between the Primary and the Mirror nodes, you can run the following commands:

  • From the Mirror node:
curl -v <Primary-Server-URL>/status

SSLPoke <Primary-Server-URL> 443
  • From the Primary node run:
curl -v <Mirror-Server-URL>/status

SSLPoke <Mirror-Server-URL> 443

 You can download the SSLPoke class here.

Both of these commands will test the network connectivity as well as SSL handshakes.

Monitor the sync between the Primary and Mirror nodes

  • The following API query endpoints can be used to get information about the repository sync:
    • /rest/mirroring/<repoID>/mirrors - Using this API you will get the "href" in the output, which you then have to copy-paste to a browser to get the sync status of the repository for which you entered the repo ID
    • /rest/mirroring/latest/repo-hashes - Using this API with the Primary URL. This API call lists the content and metadata hash of all the repositories on the Primary
    • /rest/mirroring/latest/supportInfo/refChangesQueue - The endpoint <Mirror-URL>/rest/mirroring/latest/supportInfo/repoSyncStatus is valuable for efficiently retrieving repository details. Its main purpose, however, is to determine the synchronization status of the Mirrored repositories, mainly their Initial and last synchronization timestamp. Additionally, it provides access to the latest content and metadata hashes of the repositories on the Mirror side. The output of /rest/mirroring/latest/supportInfo/repoSyncStatus will indicate when the repository was last synchronized with the mirror.
    • /rest/mirroring/latest/supportInfo/repoSyncStatus - this will indicate when the repository was last synchronized with the Mirror. It's main purpose is to determine the synchronization status of the Mirrored repositories, mainly their Initial and last synchronization timestamp. Additionally, it provides access to the latest content and metadata hashes of the repositories on the Mirror side. This API contains all the repository details, along with the content and metadata hashes, making it extremely useful.
    • The output of /rest/mirroring/latest/supportInfo/refChangesQueue/count and /rest/mirroring/latest/supportInfo/refChangesQueue will also help understand what is being queued in case the ref changes are queued.
    • To verify synchronization between the primary and Mirror repos, you can also run the command 'git ls-remote' on both, which would show you the latest ref on both the Primary and the Mirror.
  • The size of the repositories on the Primary and Mirror node won't be the same, because the Primary repository would have a lot of git objects related to pull requests etc, which won't be on the Mirror, because Mirror doesn't have a UI.
  • How a Mirror knows that it is in sync with the Primary is via content hash, every time a repo is updated on the Primary, it will get a content hash and when the Mirror is synced with the Primary, it will get the same content hash. That's how we'll know that the Primary and Mirror are in the same state.
  • If you check the API for <Primary-URL>/rest/mirroring/latest/repo-hashes, you'll see a content hash and metadata. The content is for the amount of objects that the mirror is supposed to get. When everything is synced, you'll see the content hash on the mirror and primary to matching. You can run the /rest/mirroring/latest/repo-hashes with the Mirror URL, search for that content hash, and confirm that the latest commit is in sync between the Primary and the Mirror.
  • If you have a repository in Primary that is pretty busy and is constantly updated, the Mirror will try to be in sync but if the Mirror is busy too, it will take some time for the Mirror to be up-to-date with the Primary.
  • The endpoint <Mirror-URL>/rest/mirroring/latest/supportInfo/repoSyncStatus is valuable for efficiently retrieving repository details. Its main purpose, however, is to determine the synchronization status of the Mirrored repositories, mainly their Initial and last synchronization timestamp. Additionally, it provides access to the latest content and metadata hashes of the repositories on the Mirror side. The output of /rest/mirroring/latest/supportInfo/repoSyncStatus will indicate when the repository was last synchronized with the mirror.
  • If your CI processes which are dependent on the content of the mirror and will fail if the content is not updated. To avoid this, you can use an option on the Bitbucket
  • Whenever you push a change, that gets notified to Jenkins via Webhooks. On the Bitbucket UI, go to the Repository Settings >> Webhooks page and edit one of the webhooks and you can check the Mirror synchronized option there. This will notify the CI tool only when it sees that the Mirror is synchronized with the latest changes.

(warning) Test this in non-critical repositories and if it works as expected you can implement it for the rest of the repositories

Using the Mirror URL gives "Couldn't find remote ref"

  • When you try pulling the content of the reviewed PR, you may see the message: "fatal: couldn't find remote ref refs/pull-requests/854/from"  in Bitbucket Mirror, but it works from Primary Bitbucket. 
  • For example:
git fetch blessed refs/pull-requests/854/from; git merge --no-ff FETCH_HEAD
fatal: couldn't find remote ref refs/pull-requests/854/from
  • Pull request refs (under refs/pull-requests/) are an implementation detail of the Bitbucket Server. They are not intended to be used for CI, or generally relied upon for development. This comment by Bryan Turner lists some of the reasons why it might not be a good idea to build these refs in CI. 
  • As per the design, Mirrors never sync pull request refs actively and you'll need to pull the pull request refs from the primary server only, even for your automation purposes.
  • This is detailed in the Troubleshooting Smart Mirroring page (under the "Pull request refs are not synchronized" paragraph)
  • In case the only refs that are missing are the ones under refs/pull-requests, that is the expected behavior and we have more details on this FAQ page where we state the following:

Do Smart Mirrors sync everything?

No. Smart Mirrors only synchronize the repositories in the Projects you specify. An option to sync all projects is available. Note that not all refs are synced to Smart Mirrors in real time. Certain refs, like those found under refs/pull-requests/, are only synced periodically. All public refs, like branches, found under refs/heads/, and tags, found under refs/tags/, are synced in real time

  • The refs/pull-requests will be synchronized but, depending on the frequency of the updates to the pull-requests, we may expect them never to be exactly in sync with the Primary Bitbucket node. In other words, there could always be some refs (the most recent ones) not being synchronized yet.
  • Bitbucket does not consider the refs/pull-requests as an internal implementation. Because of this, these refs when updated in the upstream (Primary Bitbucket node) don't trigger an immediate notification to the mirror. At the same time, these refs do not contribute to the logic of marking the repository as updated. When the mirror reaches out to the upstream to run a full synchronization, it only requests a synchronization after checking if one is needed. If the only change in the upstream is in the refs/pull-requests, no full synchronization is performed.
  • These refs will only synchronized when there is a change to the upstream (that is the Primary Bitbucket node) that is marked as a change that needs to be synchronized. If you try to push any change to the repository, you'll see the remaining refs/pull-requests as well from the mirror.


Last modified on Jan 7, 2025

Was this helpful?

Yes
No
Provide feedback about this article
Powered by Confluence and Scroll Viewport.