Infrastructure recommendations for enterprise Bitbucket instances on AWS
The AWS Quick Start template as a method of deployment is no longer supported by Atlassian. You can still use the template, but we won't maintain or update it.
We recommend deploying your Data Center products on a Kubernetes cluster using our Helm charts for a more efficient and robust infrastructure and operational setup. Learn more about deploying on Kubernetes.
AWS now recommends switching launch configurations, which our AWS Quick Start template uses, to launch templates. We won’t do this switch, however, as we’ve ended our support for the AWS Quick Start template. This means you're no longer able to create launch configurations using this template.
Knowing your load profile is useful for planning your instance's growth, looking for inflated metrics, or simply keeping it at a reasonable size. In Bitbucket Data Center load profiles, we showed you some simple guidelines for finding out if your instance was Small, Medium, Large, or XLarge. We based these size profiles on Server and Data Center case studies, covering varying infrastructure sizes and configurations.
As your load grows closer to Large or XLarge, you should routinely evaluate your infrastructure. Once your environment starts to experience performance or stability issues, consider migrating to a clustered (or cluster-ready) infrastructure. When you do, keep in mind that it may not be always clear how to do that effectively – for example, adding more application nodes to a growing Medium-sized instance doesn't always improve performance (in fact, the opposite might happen).
To help you plan your infrastructure set-up or growth, we ran a series of performance tests on typical Medium, Large, and XLarge instances. We designed these tests to get useful, data-driven recommendations for your clustered deployment's application and database nodes. These recommendations can help you plan a suitable clustered environment, one that is adequate for the size of your projected content and traffic.
Note that large repositories might influence performance.
We advise that you monitor performance on a regular basis.
Approach
We ran all tests in AWS. This allowed us to easily define and automate multiple tests, giving us a large (and fairly reliable) sample.
Each part of our test infrastructure was provisioned from a standard AWS component available to all AWS users. This allows for easy deployment of recommended configurations. It also means you can look up specifications in AWS documentation. This helps you find equivalent components and configurations if your organization prefers a different cloud platform or bespoke clustered solution.
You can also use AWS Quick Starts for deploying Bitbucket Data Center, though Atlassian no longer supports or maintains Quick Start templates. Instead, we recommend deploying your Data Center products on a Kubernetes cluster using our Helm charts. Learn more about deploying on Kubernetes.
Some things to consider
To effectively benchmark Bitbucket on a wide range of configurations, we designed tests that could be easily set up and replicated. Accordingly, when referencing our benchmarks for your production environment, consider:
We didn't install apps on our test instances, as we focused on finding the right configurations for the core product. When designing your infrastructure, you need to account for the impact of apps you want to install.
We used RDS with default settings across all tests. This allowed us to get consistent results with minimal setup and tuning.
Our test environment used dedicated AWS infrastructure hosted on the same subnet. This helped minimize network latency.
We used an internal testing tool called Trikit to simulate the influx of git packets. This gave us the ability to measure git request speeds without having to measure client-side git performance. It also meant our tests didn’t unpack git refs, as the tool only receives and decrypts git data.
The performance (response times) of git operations will be affected largely by repository size. Our test repositories averaged 14.2MB in size. We presume that bigger repositories might require stronger hardware.
Due to limitations in AWS, we initialized EBS volumes (storage blocks) on the NFS servers before starting the test. Without disk initializations, there is a significant increase in disk latency, and test infrastructure slows for several hours.
We enabled analytics on each test instance to collect usage data. For more information, see Change data collection settings.
Methodology
Each test involved applying the same amount of traffic to a Bitbucket data set, but on a different AWS environment. We ran three series of tests, each designed to find optimal configurations for the following components:
Bitbucket application node
Database node
NFS node
To help ensure benchmark reliability, we initialized the EBS volumes and tested each configuration for three hours. We observed stable response times throughout each test. Large instance tests used Bitbucket Data Center 5.16 while XLarge used Bitbucket Data Center 6.4. We used a custom library (Trikit) running v1 protocol to simulate Git traffic.
Data sets
Benchmark
We used the following benchmark metrics for our tests.
Benchmark metric
Threshold
Reason
Git throughput, or the number of git hosting operations (fetch/clone/push) per hour
32,700 (Minimum) for Large and
65,400 (Minimum) for XLarge,
the higher the better
These thresholds are the upper limits of traffic defined in Bitbucket Data Center load profiles. We chose them due to the spiky nature of git traffic.
Average CPU utilization (for application nodes)
75% (Maximum), the lower the better
When the application nodes reach an average of CPU usage of 75% and above, Bitbucket's adaptive throttling starts queuing Git hosting operations to ensure the responsiveness of the application for interactive users. This slows down Git operations.
Stability
No nodes go offline
When the infrastructure is inadequate in handling the load it may lead to node crashes.
The test traffic had fixed sleep times to modulate the volume of git hosting operations. This means the benchmarked git throughput doesn’t represent the maximum each configuration can handle.Architecture
We tested each configuration on a freshly-deployed Bitbucket Data Center instance on AWS. Every configuration followed the same structure:
Function
Number of nodes
Virtual machine type
Notes
Application node
Variable
m5.xlarge
m5.2xlarge
m5.4xlarge
m5.12xlarge
m5.24xlarge
When testing m5.xlarge (16GB of RAM), we used 8GB for JVM heap. For all others, we used 12GB for JVM heap. Minimum heap (Xms) was set to 1G for all the tests.
We’ve observed that using a smaller JVM heap (2-3GB) is enough for most instances.
Also note that Git operations are expensive in terms of memory consumption and are executed outside of the Java virtual machine. See more on Scaling Bitbucket Data Center.
Each Bitbucket application used 30GB General Purpose SSD (gp2) for local storage. This disk had an attached EBS volume with a baseline of 100 IOPS, burstable to 3,000 IOPS.
Database
1
m5.xlarge
m5.2xlarge
m5.4xlarge
We used Amazon RDS Postgresql version 9.4.15, with default settings. Each test only featured one node.
NFS storage
1
m5.4xlarge
m5.2xlarge
m5.xlarge
Our NFS server used a 900GB General Purpose SSD (gp2) for storage. This disk had an attached EBS volume with a baseline of 2700 IOPS, burstable to 3,000 IOPS. As mentioned, we initialized this volume at the start of each test.
For more information on setting up Bitbucket Data Center's shared file server, see Step 2. Provision your shared file system (in Install Bitbucket Data Center). This section contains the requirements and recommendations for setting up NFS for Bitbucket Data Center.Load balancer
1
We used AWS Elastic Load Balancer. Application Load Balancer at the time of performance testing doesn't handle SSH traffic.
We ran several case studies of real-life Large and XLarge Bitbucket Data Center instances to find optimal configurations for each component. In particular, we found many used m5 series virtual machine types (General Purpose Instances). As such, for the application node, we focused on benchmarking different series' configurations.
Refer to the AWS documentation on Instance Types (specifically, General Purpose Instances) for details on each virtual machine type used in our tests.Recommendations for Large-sized instances
We analyzed our benchmarks and came up with the following optimal configuration:
Best-performing and most cost-effective configuration
Component
Recommendation
Application nodes
m5.4xlarge nodes x 4
Database node
m5.2xlarge
NFS node
m5.2xlarge
Performance of this configuration
Git throughput: 45,844 per hour
Cost per hour 1: $4.168
Average CPU utilization: 45%
1 In our recommendations for Large-sized profiles, we quoted a cost per hour for each configuration. We provide this information to help inform you about the comparative price of each configuration. This cost only calculates the price of the nodes used for the Bitbucket application, database, and NFS nodes. It does not include the cost of using other components of the application like shared home and application load balancer.
These figures are in USD, and were correct as of July 2019.
We measured performance stability in terms of how far the instance’s average CPU utilization is from the 75% threshold. As mentioned, once we hit this threshold, git operations start to slow down. The further below the instance is from 75%, the less prone it is to slow due to sudden traffic spikes.
However, there are no disadvantages in using larger-size hardware (m5.12xlarge, for example), which will provide better performance.
Low-cost configuration
We also found a low-cost configuration with acceptable performance at $2.84 per hour:
Component
Recommendation
Application nodes
m5.4xlarge x 3
Database node
m5.xlarge
NFS node
m5.xlarge
This low-cost configuration offered a lower Git throughput of 43,099 git hosting calls per hour than the optimal configuration. However, this is still above our minimum threshold of 32,700 git hosting calls per hour. The trade-off for the price is fault tolerance. If the instance loses one application node, CPU usage spikes to 85%, which is above our maximum threshold. The instance will survive, but performance will suffer.
Recommendations for XLarge instances
We analyzed our benchmarks and came up with the following optimal configuration:
Best-performing configuration
Component
Recommendation
Application nodes
m5.12xlarge x 4
Database node
m5.2xlarge
NFS node
m5.2xlarge
Performance of this configuration
Git throughput: 75,860 per hour
Cost per hour 1: $10.312
Average CPU utilization: 65%
We measured performance stability in terms of how far the instance’s average CPU utilization is from the 75% threshold. As mentioned, once we hit this threshold, git operations start to slow down. The further below the instance is from 75%, the less prone it is to slow due to sudden traffic spikes.
1 In our recommendations for Extra Large-sized profiles, we quoted a cost per hour for each configuration. We provide this information to help inform you about the comparative price of each configuration. This cost only calculates the price of the nodes used for the Bitbucket application, database, and NFS nodes. It does not include the cost of using other components of the application like shared homeand application load balancer.
These figures are in USD, and were correct as of July 2019.
Low-cost configuration
We also found a low-cost configuration with good performance at $7.02 per hour:
Component
Recommendation
Application nodes
m5.8xlarge x 4
Database node
m5.2xlarge
NFS node
m5.xlarge
This low-cost configuration offered a lower Git throughput of 74,275 git hosting calls per hour than the optimal configuration.However, this is still well above the defined threshold of 65,400 git hosting calls per hour. The trade-off for the price is fault tolerance. There were timeouts and errors observed on the m5.8xlarge x 3 nodes, so performance degradation may be encountered if the an application node goes down.
The following table shows all test configurations that passed our threshold, that is, above 32,500 git hosting operations per hour and below 75% CPU utilization, with no node crashes. We sorted each configuration by descending throughput.
Application nodes
Database node
NFS node
Git throughput
Cost per hour
m5.12xlarge x 4
m5.2xlarge
m5.2xlarge
75,860
$ 10.31
m5.4xlarge x 8
m5.2xlarge
m5.2xlarge
73,374
$ 7.24
m5.8xlarge x 4
m5.2xlarge
m5.xlarge
74,275
$ 7.02
m5.4xlarge x 6
m5.2xlarge
m5.2xlarge
71,872
$ 5.70
m5.12xlarge x 3
m5.2xlarge
m5.2xlarge
66,660
$ 8.01
Application node test results
Database node test results
NFS node test results
Disk I/O