Failover for Bitbucket Data Center
An important advantage of the Active/Active clustering configuration of Bitbucket Data Center is high availability. If one cluster node goes down, the remaining nodes can continue servicing requests, so users see little or no loss of availability.
As with any system, each component of a Bitbucket Data Center instance is a potential point of failure and loss of availability for users. It is therefore important to have a failover plan for each of them to ensure maximum availability. This page provides recommendations on how to maximize availability for users with Bitbucket Data Center.
Bitbucket Data Center components
A Bitbucket Data Center instance consists of several dedicated machines, connected as shown in the following diagram:
Load balancer failover
Your load balancer distributes requests from users to your cluster nodes. Bitbucket Data Center does not bundle a load balancer.
We highly recommend that you choose a load balancer that provides clustered or failover features, so as to maximize availability in the event of load balancer faults.
The database is the single "source of truth" in a Bitbucket Data Center instance.
We highly recommend that you make use of the database's high availability features to maximize availability in the event of failure.
File server failover
The NFS file server contains all the repository, attachment and avatar data in a Bitbucket Data Center.
We highly recommend that you make use of the replication or high availability features provided.
Cluster node failover
The cluster of Bitbucket nodes share the workload of incoming requests to provide Active/Active failover. Each node is a complete Bitbucket instance that runs on a dedicated machine. Cluster nodes are the most complex components of Bitbucket Data Center, and the most important to focus on for high availability.
Bitbucket Data Center is designed so that failure of a cluster node causes little or no loss of availability to users, provided the following constraints are met:
- Your load balancer is able to quickly detect the failure and fail over to the remaining cluster nodes. Bitbucket provides a
/statusREST resource that is designed for load balancers to check the health cluster nodes periodically. If a cluster node goes down, most load balancers are able to detect the failure and direct traffic to the other nodes within seconds.
- You have provisioned sufficient spare capacity (cluster nodes) in your cluster to be able to handle the requests from users even with one or more nodes down. Provisioning at least one extra cluster node above the number of nodes you require at peak load is highly recommended.
If a cluster node does go down, users who are actively using it may notice the failure in a number of ways:
Any open Git clone, fetch, or push connections being handled by the failing node may be broken, but if retried, will succeed.
Any background work running on the failing node, such as pull request rescopes, comment drift calculations, and user directory synchronization, may take a bit longer to complete, as this work is taken over by another cluster node.
- Unless measures are taken to preserve users' authentication credentials across the cluster, any users whose session was directed to the failing node will be logged out and will have to log in to Bitbucket again. Measures to preserve users' authentication credentials and avoid this extra login are described below.
Any other data stored in the session (such as fields entered in one page of a multi-page form) on the failing node may be lost and have to be re-entered. Bitbucket and most Atlassian-supplied add-ons do not store any such state in the session, but third party add-ons that were not designed for Bitbucket Data Center may.
Configuration options relating to session data and user authentication are discussed in more depth in 'Session Management' below.
The faster your load balancer detects faulty cluster nodes and fails over, the less disruption will be experienced by your users. Most load balancers allow the frequency of health checks to be configured. The shorter the interval between health checks, the better in general, but there is a limit to how rapidly you can poll the
/status endpoint without generating excessive load on CPU and network resources. A health check frequency of once every 1-5 seconds is highly recommended.
Bitbucket Data Center assumes that your load balancer applies session affinity ("sticky sessions") , and therefore always directs each user's requests to the same cluster node. Bitbucket Data Center maintains each user's session information locally, on one specific cluster node. This is the best configuration for performance.
But it also means that if a cluster node goes down, any users whose session state was stored on the failing node will lose this information, be logged out, and have to log in again. There are a number of ways you can eliminate this extra login, but as there is a performance tradeoff involved, no single method is enabled by default.
Use remember-me authentication
With remember-me authentication, users who successfully authenticate receive a
_atl_bitbucket_remember_me cookie that persists in the database for 30 days and allows them to re-authenticate without logging in again, regardless of which cluster node their requests are directed to.
By default, remember-me authentication is optional, so users must check "Keep me logged in" when logging in to enable remember-me authentication. You can force remember-me authentication to be enabled for all users by setting the
auth.remember-me.enabled property in
Remember-me authentication only eliminates the extra login within the 30 day lifetime of the cookie. It does not store other session information.
Use Crowd SSO for authentication
Another way to ensure users do not have to log in again if a node goes down is to use the single sign-on (SSO) feature of Atlassian Crowd. If Crowd SSO is enabled, then users whose Crowd directory has been configured for SSO should not have to log in again.
Crowd SSO only provides automatic authentication of users' credentials. It does not store other session information.
See Connecting Bitbucket to Crowd for more information.
Use your own custom authentication plug-in
If you have your own custom (in house or third party) authentication plug-in, then you can avoid the extra login provided your plug-in is written to be cluster aware.