High availability for Bitbucket Server

This page describes how to set up a single Bitbucket Server instance in a highly available configuration. 

For production installs, we highly recommend that you first read Using Bitbucket Server in the enterprise

For Active/Active high availability with Bitbucket Server, see Bitbucket Data Center instead.

For guidance on using Bitbucket Data Center as part of your disaster recovery strategy, see the Disaster recovery guide for Bitbucket Data Center.

If Bitbucket Server is a critical part of your development workflow, maximizing application availability becomes an important consideration. There are many possible configurations for setting up a HA environment for Bitbucket Server, depending on the infrastructure components and software (SAN, clustered databases, etc.) you have at your disposal. This guide provides a high-level overview and the background information you need to be able to set up a single Bitbucket Server in a highly available configuration.

Note that Atlassian's Bitbucket Data Center product uses a cluster of Bitbucket Server nodes to provide Active/Active failover. It is the deployment option of choice for larger enterprises that require high availability and performance at scale, and is fully supported by Atlassian. Read about Failover for Bitbucket Data Center.

High availability

High availability describes a set of practices aimed at delivering a specific level of "availability" by eliminating and/or mitigating failure through redundancy. Failure can result from unscheduled down-time due to network errors, hardware failures or application failures, but can also result from failed application upgrades. Setting up a highly available system involves:

Proactive Concerns

  • Change management (including staging and production instances for change implementation)
  • Redundancy of network, application, storage and databases
  • Monitoring system(s) for both the network and applications

Reactive Concerns

  • Technical failover mechanisms, either automatic or scripted semi-automatic with manual switchover
  • Standard Operating Procedure for guided actions during crisis situations

This guide assumes that processes such as change management are already covered and will focus on redundancy / replication and failover procedures. When it comes to setting up your infrastructure to quickly recover from system or application failure, you have different options. These options vary in the level of uptime they can provide. In general, as the required uptime increases, the complexity of the infrastructure and the knowledge required to administer the environment increases as well (and by extension the cost goes up as well). 

Understanding the availability requirements for Bitbucket Server

Central version control systems such as Subversion, CVS, ClearCase and many others require the central server to be available for any operation that involves the version control system. Committing code, fetching the latest changes from the repository, switching branches or retrieving a diff all require access to the central version control system. If that server goes down, developers are severely limited in what they can do. They can continue coding until they're ready to commit, but then they're blocked.

Git is a distributed version control system and developers have a full clone of the repository on their machines. As a result, most operations that involve the version control system don't require access to the central repository. When Bitbucket Server is unavailable developers are not blocked to the same extent as with a central version control system.

As a result, the availability requirements for Bitbucket Server may be less strict than the requirements for say Subversion.

Consequences of Bitbucket Server unavailability

(tick) Unaffected (error) Affected

Developer:

  • Commit code
  • Create branch
  • Switch branches
  • Diff commits and files
  • ...
  • Fetch changes from fellow developers

Developer:

  • Clone repository
  • Fetch changes from central repository
  • Push changes to central repository
  • Access Bitbucket Server UI - create/do pull requests, browse code

Build server:

  • Clone repository
  • Poll for changes

Continuous Deployment:

  • Clone repository

Failover options

High availability and recovery solutions can be categorized as follows:

Failover option

Recovery time Description Possible with Bitbucket Server
Automatic correction / restart

2-10 min (application failure)

hours-days (system failure)

  • Single node, no secondary server available
  • Application and server are monitored
  • Upon failure of production system, automatic restarting is conducted via scripting
  • Disk or hardware failure may require reprovisioning of the server and restoring application data from a backup
(tick)
Cold standby 2-10 min
  • Secondary server is available
  • Bitbucket Server is NOT running on secondary server
  • Filesystem and (optionally) database data is replicated between the 'active' server and the 'standby' server
  • All requests are routed to the 'active' server
  • On failure, Bitbucket Server is started on the 'standby' server and shut down on the 'active' server. All requests are now routed to the 'standby' server, which becomes 'active'.
(tick)
Warm standby 0-30 sec
  • Secondary service is available
  • Bitbucket Server is running on both the 'active' server and the 'standby' server, but all requests are routed to the 'active' server
  • Filesystem and database data is replicated between the 'active' server and the 'standby' server
  • All requests are routed to the 'active' server
  • On failure, all requests are routed to the 'standby' server, which becomes 'active'
  • (error) This configuration is currently not supported by Bitbucket Server, because Bitbucket Server uses in-memory caches and locking mechanisms. At this time, Bitbucket Server only supports a single application instance writing to the Bitbucket Server home directory at a time.
(error)
Active/Active < 5 sec
  • Provided by Bitbucket Data Center, using a cluster of Bitbucket Server nodes and a load balancer.
  • Bitbucket Server is running, and serving requests, on all cluster nodes.
  • Filesystem and database data is shared by all cluster nodes. Clustered databases are not yet supported.
  • All requests are routed to the load balancer, which distributes requests to the available cluster nodes. If a cluster node goes down, the load balancer immediately detects the failure and automatically directs requests to the other nodes within seconds.
  • Bitbucket Data Center is the deployment option of choice for larger enterprises that require high availability and performance at scale.
(tick)

 

Automatic correction

Before implementing failover solutions for your Bitbucket Server instance consider evaluating and leveraging automatic correction measures. These can be implemented through a monitoring service that watches your application and performs scripts to start, stop, kill or restart services.

  1. A Monitoring Service detects that the system has failed.
  2. A correction script attempts to gracefully shut down the failed system.
    1. If the system does not properly shut down after a defined period of time, the correction script kills the process.
  3. After it is confirmed that the process is not running anymore, it is started again.
  4. If this restart solved the failure, the mechanism ends.
    1. If the correction attempts are not or only partially successful a failover mechanism should be triggered, if one was implemented.

Cold standby

The cold standby (also called Active/Passive) configuration consists of two identical Bitbucket Server instances, where only one server is ever running at a time. The Bitbucket home directory on each of the servers is either a shared (and preferably highly available) network file system or is replicated from the active to the standby Bitbucket Server instance. When a system failure is detected, Bitbucket Server is restarted on the active server. If the system failure persists, a failover mechanism is started that shuts down Bitbucket Server on the active server and starts Bitbucket Server on the standby server, which is promoted to 'active'. At this time, all requests should be routed to the newly active server.

For each component in the chain of high availability measures, there are various implementation alternatives. Although Atlassian does not recommend any particular technology or product, this guide gives options for each step.

System setup

This section describes one possible configuration for how to set up a single instance of Bitbucket Server for high availability.


System Setup

Components

Request Router
Forwards traffic from users to the active Bitbucket Server instance.

High Availability Manager

  • Tracks the health of the application servers and decides when to fail over to a standby server and designate it as active.
  • Manages failover mechanisms and sends notifications on system failure.

Bitbucket Server instance

  • Each server hosts an identical Bitbucket Server installation (identical versions).
  • Only one server is ever running a Bitbucket Server instance at any one time (know as the active server). All others are considered as standbys.
  • Resides on a replicated or shared file system visible to all application servers.
  • Must never be modified when the server is in standby mode.

Bitbucket Server DB
The production database, which should be highly available. How this is achieved is not explored in this document. See the following database vendor-specific information on the HA options available to you:

Licensing

Developer licenses can be used for non-production installations of Bitbucket Server deployed on a cold stand-by server. For more information see developer licenses.

Last modified on Sep 21, 2017

Was this helpful?

Yes
No
Provide feedback about this article
Powered by Confluence and Scroll Viewport.