How to check and restore replica consistency after the mesh node restart in Bitbucket Mesh
Platform Notice: Data Center Only - This article only applies to Atlassian products on the Data Center platform.
Note that this KB was created for the Data Center version of the product. Data Center KBs for non-Data-Center-specific features may also work for Server versions of the product, however they have not been tested. Support for Server* products ended on February 15th 2024. If you are running a Server product, you can visit the Atlassian Server end of support announcement to review your migration options.
*Except Fisheye and Crucible
Summary
Sometimes one or multiple Mesh nodes need to be shut down for maintenance. While the node is offline the changes from other nodes are not synchronized and that can cause replica on this node to be out of date.
During normal startup procedures, Bitbucket Mesh performs automatic repair of the inconsistent replicas so within some minutes after the restart all your Mesh nodes should be consistent.
However, in the case of any issue during the automatic repair process, you can check manually.
Environment
Bitbucket 8.x and newer
Mesh 1.0.0 and newer
At least one of the repositories has been migrated to Mesh nodes.
Solution
In order to check if there are any inconsistent replicas present on one of the nodes you can perform the following REST API call:
curl -X GET --location "{{base_url}}/rest/ui/latest/admin/git/mesh/inconsistent-replicas" \
-H "Accept: application/json" \
--basic --user {{user}}:{{password}}
The example output from this command can be seen below. In the output you can see 3 nodes: Node121, Node240, Node228. The Node121 was offline for maintenance during the commit has been done and didn't recover on the restart making it inconsistent with other Mesh nodes.
[{"node":{"id":1,"lastSeenDate":1709136801332,"name":"Node121","rpcId":"1","rpcUrl":"http://xx.xxx.xx.121:7777","state":"OFFLINE","offline":true},
"inconsistentReplicas":[{"repository":{"slug":"atlas-platform","id":1,"name":"atlas-platform","hierarchyId":"eeea6a9600ed7f3f55af","scmId":"git","state":"AVAILABLE","statusMessage":"Available","forkable":true,"project":{"key":"ATL","id":1,"name":"ATLAS","public":false,"type":"NORMAL"},"public":false,"partition":45,"archived":false},"remoteId":"p/002d/h/eeea6a9600ed7f3f55af/r/1","version":6}]},
{"node":{"id":2,"lastSeenDate":1709136838358,"name":"Node240","rpcId":"2","rpcUrl":"http://xx.xxx.xx.240:7777","state":"AVAILABLE","offline":false},"inconsistentReplicas":[]},
{"node":{"id":3,"lastSeenDate":1709136838358,"name":"Node228","rpcId":"3","rpcUrl":"http://xx.xxx.xx.228:7777","state":"AVAILABLE","offline":false},"inconsistentReplicas":[]}]%
In the example above only one repository is affected. In that case, you can reinitiate the repair procedure manually by executing the following REST API call:
curl -X POST --location "{{host}}/rest/ui/latest/admin/git/mesh/troubleshooting/projects/{{project-key}/repos/{{repo-slug}}/replicas/{{meshNodeIdToRepair}}/repair?sourceNodeId={{meshNodeIdSource}}" \
-H "Content-Type: application/json" \
-H "Accept: application/json" \
--basic --user {{user}}:{{password}}
In case of multiple repositories/projects are affected you can reinitiate the repair procedure for all repositories located on the Mesh nodes with the following REST call:
curl -X POST --location "{{host}}/rest/ui/latest/admin/git/mesh/troubleshooting/verify-consistency" \
-H "Content-Type: application/json" \
-H "Accept: application/json" \
--basic --user {{user}}:{{password}}