How to replace a Bitbucket Mesh node in Bitbucket Data Center
Platform Notice: Data Center Only - This article only applies to Atlassian products on the Data Center platform.
Note that this KB was created for the Data Center version of the product. Data Center KBs for non-Data-Center-specific features may also work for Server versions of the product, however they have not been tested. Support for Server* products ended on February 15th 2024. If you are running a Server product, you can visit the Atlassian Server end of support announcement to review your migration options.
*Except Fisheye and Crucible
Summary
With its inherent high-availability architecture, Bitbucket Mesh makes IT infrastructure maintenance easier.
In some maintenance scenarios, replacing one of the nodes where Bitbucket Mesh runs may be required. This can happen in cases like these:
- Part of the hardware where the Bitbucket Mesh node runs is failing and needs replacement.
For example, hardware monitoring of a hard disk drive signals that it needs an urgent replacement. - Replacement of the whole Bitbucket Mesh node's hardware is required, but for some reason, direct data copying from old to new hardware is not feasible.
- Bitbucket Mesh runs on a virtual machine, which has to be replaced, but data copying is not possible or would be too complicated.
This article describes replacing one Bitbucket Mesh node with another newly added one.
Note: Bitbucket Mesh is not the same as Bitbucket Mesh sidecar
This article discusses Bitbucket Mesh, an architecture distinct from Bitbucket Mesh sidecar.
To learn more about the two related but still distinct architectures, please check the Bitbucket Mesh whitepaper.
Environment
Bitbucket Data Center 8.x and newer
Remote Mesh 1.0.0 and newer
At least one of the repositories has been migrated to Mesh nodes.
Plan thoroughly when to replace mesh nodes
The replication factor determines how many replicas one Git repository has across the mesh nodes; the minimum and default value is 3.
- You will not be able to remove mesh nodes leaving the mesh cluster with fewer nodes than the replication factor: the system will block this action with an error "Unable to unregister node with RPC URL ... The replication factor of 3 cannot be respected if this node is unregistered.".
- If you have more than 3 nodes, the best time to add or remove them is outside business hours, during a predefined maintenance window.
The intention is to reduce the load on existing mesh nodes and lower the chances of a replica being unable to catch up due to concurrent commits being made during the partition migration.
On large, busy repositories, the repair job may be unable to catch up with the latest changes.
Currently, there is no workaround for this situation other than waiting for the operations on the repository to calm down long enough so the repair process can be completed.
Mesh node replacement procedure
The steps below explain the procedure.
Some steps could be done from within the Bitbucket Web user interface, but others must be executed using the REST API.
To get a more readable output from the curl REST API calls, you can pipe them through the "jq" command, if you have it installed on your workstation.
For example: curl -X GET ... | jq
Add the new Bitbucket Mesh node to the system. Use the Bitbucket UI mesh settings page Administration / Bitbucket Mesh for this.
Alternatively, you can use the Register New Mesh Node REST API call.Use the Get all registered Mesh nodes REST API call to get a list of all mesh nodes.
The output will list all mesh nodes. You can use thename
andrpcUrl
attributes in the output to locate the newly added node. These two attributes correspond to the data entered while adding the new node in the first step.
Theid
attribute contains the node’s ID, and we will use it as the value of{{meshNodeId}}
for the next step.curl -X GET \ -H "Accept: application/json" \ --basic --user {{user}}:{{password}} \ --location "{{host}}/rest/api/latest/admin/git/mesh/nodes"
Look at the partition migrations by calling the REST API endpoint given below; set the value of the
{{meshNodeId}}
parameter to the ID of the newly added mesh node we obtained in the previous step.
Wait until all partition migrations have been completed - completed migration is denoted by the"state": "FINISHED"
in output.curl -X GET \ -H "Accept: application/json" \ --basic --user {{user}}:{{password}} \ --location "{{host}}/rest/ui/latest/admin/git/mesh/partition-migration/?targetNodeId={{meshNodeId}}"
If we call this REST API before adding the new node, it won’t return any meaningful result. Instead, it will return an HTTP error 404.
If there are many partition migrations, piping the output through thejq
command will help get a clearer printout. For example, you can add this to the end of the previouscurl
command:| jq '.values[].state'
Check for inconsistent Git repository replicas on the new node. The REST API call will return the list for all Mesh nodes; look at the node with an
id
that equals{{meshNodeId}}
we used earlier.
Inconsistent replicas will be shown in the inconsistentReplicas JSON array. If there are no inconsistencies, it will be shown as an empty array:"inconsistentReplicas": []
In case there are inconsistent replicas, follow the page How to check and restore replica consistency after the mesh node restartcurl -X GET \ -H "Accept: application/json" \ --basic --user {{user}}:{{password}} \ --location "{{host}}/rest/ui/latest/admin/git/mesh/inconsistent-replicas"
Delete the old mesh node using the Bitbucket UI mesh settings at the Administration / Bitbucket Mesh page.
After the old mesh node is removed, you are free to decommission the mesh node's server.