Upgrade a Confluence cluster through the API without downtime
This document provides guidance on how to initiate and finalize a rolling upgrade through API calls. This upgrade method is suitable for admins with the skills and automation tools to orchestrate maintenance tasks (like upgrades).
For an overview of rolling upgrades (including planning and preparation information), see Upgrade Confluence without downtime.
API reference
The entire rolling upgrade process is governed by the following API:
http://<host>:<port>/rest/zdu/cluster/zdu/
This API has the following calls:
Get an overview of the cluster's status. | |
Enable upgrade mode. | |
Get the status of the cluster. | |
Get an overview of a node's status, including the number of running tasks. | |
Disable upgrade mode. You can only use this call if the upgrade progress is not MIXED. | |
Once all nodes are upgraded, finalize the rolling upgrade. This will automatically disable upgrade mode. |
For detailed information about each API call, see Confluence REST API Documentation.
Initiating a rolling upgrade
To initiate a rolling upgrade, enable rolling upgrade first. To do this, use:
http://<host>:<port>/rest/zdu/cluster/zdu/start
Upgrade mode allows your cluster to temporarily accept nodes running different Confluence versions. This lets you upgrade a node and let it rejoin the cluster (along with the other non-upgraded nodes). Both upgraded and non-upgraded active nodes work together to keep Confluence available to all users. You can disable upgrade mode as long as you haven’t upgraded any nodes yet.
Upgrading each node individually
Before you upgrade a node, you'll need to gracefully shut down Confluence on it. To do this, run the stop script corresponding to your operating system and configuration. Learn more about graceful Confluence shutdowns.
For example, if you installed Confluence as a service on Linux, run the following command:
$ sudo /etc/init.d/confluence stop
After upgrading Confluence on the node, wait for it to transition to an Active status first before upgrading another node.
Node statuses
To get the status of a node, use:
http://<host>:<port>/rest/zdu/cluster/zdu/nodes/<nodeID>
ACTIVE | Confluence is connected to the cluster and running with no errors. |
---|---|
STARTING | Confluence is still loading, and should transition to Active once finished. |
TERMINATING | Confluence was gracefully shut down, and should transition to Offline once finished. |
OFFLINE | Confluence is not responding on the node. This node will be removed from the cluster completely if it is still offline after Upgrade mode is disabled. |
ERROR | Something went wrong with Confluence on the node. |
Cluster statuses
To get the status of the cluster, use:
http://<host>:<port>/rest/zdu/cluster/zdu/state
STABLE | You can turn on Upgrade mode now. |
---|---|
READY_TO_UPGRADE | Upgrade mode is enabled, but no nodes have been upgraded yet. You can start upgrading your first node now. |
MIXED | At least one node is upgraded, but you haven't finished upgrading all nodes yet. Your cluster has nodes running different Confluence versions. You need to upgrade all nodes to the same bug fix version to transition to the next status (READY_TO_RUN_UPGRADE_TASKS). |
READY_TO_RUN_UPGRADE_TASKS | All nodes have node been upgraded. You can now finalize the rolling upgrade:
|
Enable and disable Upgrade mode
Mixed status with Upgrade mode disabled
If a node is in an Error state with Upgrade mode disabled, you can't enable Upgrade mode. Fix the problem or remove the node from the cluster to enable Upgrade mode.
Troubleshooting
Node errors during rolling upgrade
There are several ways to address this:
Shut down Confluence gracefully on the node. This should disconnect the node from the cluster, allowing the node to transition to an Offline status.
If you can’t shut down Confluence gracefully, shut down the node altogether.
Once all active nodes are upgraded with no nodes in Error, you can finalize the rolling upgrade. You can investigate any problems with the problematic node afterwards and re-connect it to the cluster once you address the error.
Disconnecting a node from the cluster through the load balancer
If a node error prevents you from gracefully shutting down Confluence, try disconnecting it from the cluster through the load balancer. The following table provides guidance how to do so for popular load balancers.
NGINX | NGINX defines groups of cluster nodes through the upstream directive . To prevent the load balancer from connecting to a node, delete the node's entry from its corresponding upstream group. Learn more about the upstream directive in the ngx_http_upstream_module module. |
---|---|
HAProxy | With HAProxy, you can disable all traffic to the node by putting it in a
Learn more about forcing a server's administrative state. |
Apache | You can disable a node (or "worker") by setting its activation member attribute to disabled. Learn more about advanced load balancer worker properties in Apache. |
Azure Application Gateway | We provide a deployment template for Confluence Data Center on Azure; this template uses the Azure Application Gateway as its load balancer. The Azure Application Gateway defines each node as a target within a backend pool. Use the Edit backend pool interface to remove your node's corresponding entry. Learn more about adding (and removing) targets from a backend pool. |
Traffic is disproportionately distributed during or after upgrade
Some load balancers might use strategies that send a disproportionate amount of active users to a newly-upgraded node. When this happens, the node might become overloaded, slowing down Confluence for all users logged in to the node.
To address this, you can also temporarily disconnect the node from the cluster. This will force the load balancer to re-distribute active users between all other available nodes. Afterwards, you can add the node again to the cluster.