Recommendations for running Bitbucket in AWS

This page provides general sizing and configuration recommendations for running self-managed Bitbucket instances on Amazon Web Services. 

To get the best performance out of your Bitbucket deployment in AWS, it's important to not under-provision your instance's CPU, memory, or I/O resources. Note that the very smallest instance types provided by AWS do not meet Bitbucket's minimum hardware requirements and aren't recommended in production environments. If you don't provision sufficient resources for your workload, Bitbucket is likely to exhibit slow response times, display a Bitbucket Server is reaching resource limits banner, or fail to start altogether. 

On this page


Bitbucket Data Center recommendations

Knowing your load profile is useful for planning your instance's growth, looking for inflated metrics, or simply keeping it at a reasonable size. In Bitbucket Data Center load profiles, we showed you some simple guidelines for finding out if your instance was Small, Medium, Large, or XLarge. We based these size profiles on Server and Data Center case studies, covering varying infrastructure sizes and configurations.

If your instance is close to outgrowing its size profile, it may be time to consider upgrading your infrastructure. In most cases, upgrading involves deciding how to deploy your Bitbucket Server Data Center application, NFS, and database nodes. However, it's not always clear how to do that effectively.

To help you, we ran a series of performance tests against a typical Large and XLarge Bitbucket Server Data Center instance. We designed these tests to deliver useful, data-driven recommendations for your deployment's application and database nodes. These recommendations can help you plan a suitable environment, or check whether your current instance is adequate for the size of your content and traffic.

Note that large repositories might influence performance.

We advise that you monitor performance on a regular basis.

Approach

We ran all tests in AWS. This allowed us to easily define and automate multiple tests, giving us a large (and fairly reliable) sample.

Each part of our test infrastructure was provisioned from a standard AWS component available to all AWS users. This allows for easy deployment of recommended configurations. You can also use AWS Quick Starts for deploying Bitbucket Server Data Center.

It also means you can look up specifications in AWS documentation. This helps you find equivalent components and configurations if your organization prefers a different cloud platform or bespoke clustered solution.

Some things to consider

To effectively benchmark Bitbucket on a wide range of configurations, we designed tests that could be easily set up and replicated. Accordingly, when referencing our benchmarks for your production environment, consider:

  • We didn't install apps on our test instances, as we focused on finding the right configurations for the core product. When designing your infrastructure, you need to account for the impact of apps you want to install.

  • We used RDS with default settings across all tests. This allowed us to get consistent results with minimal setup and tuning.

  • Our test environment used dedicated AWS infrastructure hosted on the same subnet. This helped minimize network latency.

  • We used an internal testing tool called Trikit to simulate the influx of git packets. This gave us the ability to measure git request speeds without having to measure client-side git performance. It also meant our tests didn’t unpack git refs, as the tool only receives and decrypts git data.

  • The performance (response times) of git operations will be affected largely by repository size. Our test repositories averaged 14.2MB in size. We presume that bigger repositories might require stronger hardware.

  • Due to limitations in AWS, we initialized EBS volumes (storage blocks) on the NFS servers before starting the test. Without disk initializations, there is a significant increase in disk latency, and test infrastructure slows for several hours.

    We enabled analytics on each test instance to collect usage data. For more information, see Collecting analytics for Bitbucket Server.

    Methodology

    Each test involved applying the same amount of traffic to a Bitbucket data set, but on a different AWS environment. We ran three series of tests, each designed to find optimal configurations for the following components:

    • Bitbucket application node

    • Database node

    • NFS node

    To help ensure benchmark reliability, we initialized the EBS volumes and tested each configuration for three hours. We observed stable response times throughout each test. Large instance tests used Bitbucket Data Center 5.16 while XLarge used Bitbucket Data Center 6.4. We used a custom library (Trikit) running v1 protocol to simulate Git traffic.

    Data sets

    Large instance

    We created a Large-sized Bitbucket Data Center instance with the following dimensions:

    Metric

    Value (approximate)

    Repositories

    52,000

    Active users

    25,000

    Pull requests

    850,000

    Traffic (git operations per hour)

    40,000

    Content and traffic profiles are based on Bitbucket Data Center load profiles, which put the instance’s overall load profile at the highest level of Large profile. We believe these metrics represent a majority of real-life, Large-sized Bitbucket Data Center instances.

    More details about data set dimensions

    Metric

    Value (approximate)

    Users

    25,000

    Groups

    50,000

    Projects (including personal)

    16,700

    Comments on pull requests

    3,500,000

    Metric

    Total

    Component

    Value (approximate)

    Total repositories

    52,000

    Regular repositories

    26,000

    Public forks

    9,000

    Private repositories

    17,000

    Total pull requests

    859,000

    Pull requests open

    8,500

    Pull requests merged

    850,000

    Traffic

    (git operations per hour)

    40,000

    Clones

    16,000

    Fetches

    14,000

    Pushes

    10,000

    XLarge instance

    We created an XLarge-sized Bitbucket Data Center instance with the following dimensions:

    Metric

    Value (approximate)

    Repositories

    110,000

    Active users

    50,000

    Pull requests

    1,790,000

    Traffic (git operations per hour)

    65,000

    Content and traffic profiles are based on Bitbucket Data Center load profiles, which put the instance’s overall load profile at the XLarge profile. We believe these metrics represent a majority of real-life, XLarge-sized Bitbucket Data Center instances.

    More details about data set dimensions

    Metric

    Value (approximate)

    Users

    25,000

    Groups

    3,000

    Projects (including personal)

    52,000

    Comments on pull requests

    8,700,000

    Metric

    Total

    Component

    Value (approximate)

    Total repositories

    105,000

    Regular repositories

    52,000

    Public forks

    17,000

    Private repositories

    35,000

    Total pull requests

    1,790,000

    Pull requests open

    130,000

    Pull requests merged

    1,660,000

    Traffic

    (git operations per hour)

    70,000

    Clones

    18,700

    Fetches

    25,300

    Pushes

    26,000

    Benchmark

    We used the following benchmark metrics for our tests.

    Benchmark metric

    Threshold

    Reason

    Git throughput, or the number of git hosting operations (fetch/clone/push) per hour

    32,700 (Minimum) for Large and

    65,400 (Minimum) for XLarge,

    the higher the better

    These thresholds are the upper limits of traffic defined in Bitbucket Data Center load profiles. We chose them due to the spiky nature of git traffic.

    Average CPU utilization (for application nodes)

    75% (Maximum), the lower the better

    When the application nodes reach an average of CPU usage of 75% and above, Bitbucket's adaptive throttling starts queuing Git hosting operations to ensure the responsiveness of the application for interactive users. This slows down Git operations.

    Stability

    No nodes go offline

    When the infrastructure is inadequate in handling the load it may lead to node crashes.

    The test traffic had fixed sleep times to modulate the volume of git hosting operations. This means the benchmarked git throughput doesn’t represent the maximum each configuration can handle.

    Architecture

    We tested each configuration on a freshly-deployed Bitbucket Server Data Center instance on AWS. Every configuration followed the same structure:

    Function

    Number of nodes

    Virtual machine type

    Notes

    Application node

    Variable

    m5.xlarge

    m5.2xlarge

    m5.4xlarge

    m5.12xlarge

    m5.24xlarge

    When testing m5.xlarge (16GB of RAM), we used 8GB for JVM heap. For all others, we used 12GB for JVM heap. Minimum heap (Xms) was set to 1G for all the tests.

    If you do not a great number of 3rd party plugins, smaller JVM heap (2-3GB) is enough.

    Also note that Git operations are expensive in terms of memory consumption and are executed outside of the Java virtual machine. See more on Scaling Bitbucket Server.

    Each Bitbucket application used 30GB General Purpose SSD (gp2) for local storage. This disk had an attached EBS volume with a baseline of 100 IOPS, burstable to 3,000 IOPS.

    Database

    1

    m5.xlarge

    m5.2xlarge

    m5.4xlarge

    We used Amazon RDS Postgresql version 9.4.15, with default settings. Each test only featured one node.

    NFS storage

    1

    m5.4xlarge

    m5.2xlarge

    m5.xlarge

    Our NFS server used a 900GB General Purpose SSD (gp2) for storage. This disk had an attached EBS volume with a baseline of 2700 IOPS, burstable to 3,000 IOPS. As mentioned, we initialized this volume at the start of each test.

    Load balancer

    1

    AWS Application Load Balancer (ELB)

    We used AWS Elastic Load Balancer. Application Load Balancer at the time of performance testing doesn't handle SSH traffic.

    We ran several case studies of real-life Large and XLarge Bitbucket Data Center instances to find optimal configurations for each component. In particular, we found many used m5 series virtual machine types (General Purpose Instances). As such, for the application node, we focused on benchmarking different series' configurations.

    Refer to the AWS documentation on Instance Types (specifically, General Purpose Instances ) for details on each virtual machine type used in our tests.

    Recommendations for Large-sized instances

    We analyzed our benchmarks and came up with the following optimal configuration:

    Best-performing and most cost-effective configuration 

    Component

    Recommendation

    Application nodes

    m5.4xlarge nodes x 4

    Database node

    m5.2xlarge

    NFS node

    m5.2xlarge

    Performance of this configuration

    • Git throughput: 45,844 per hour

    • Cost per hour 1: $4.168

    • Average CPU utilization: 45%

    1 In our recommendations for Large-sized profiles, we quoted a cost per hour for each configuration. We provide this information to help inform you about the comparative price of each configuration. This cost only calculates the price of the nodes used for the Bitbucket application, database, and NFS nodes. It does not include the cost of using other components of the application like shared home and application load balancer.

    These figures are in USD, and were correct as of July 2019.

    We measured performance stability in terms of how far the instance’s average CPU utilization is from the 75% threshold. As mentioned, once we hit this threshold, git operations start to slow down. The further below the instance is from 75%, the less prone it is to slow due to sudden traffic spikes.

    However, there are no disadvantages in using larger-size hardware (m5.12xlarge, for example), which will provide better performance.

    Low-cost configuration

    We also found a low-cost configuration with acceptable performance at $2.84 per hour:

    Component

    Recommendation

    Application nodes

    m5.4xlarge x 3

    Database node

    m5.xlarge

    NFS node

    m5.xlarge

    This low-cost configuration offered a lower Git throughput of 43,099 git hosting calls per hour than the optimal configuration. However, this is still above our minimum threshold of 32,700 git hosting calls per hour. The trade-off for the price is fault tolerance. If the instance loses one application node, CPU usage spikes to 85%, which is above our maximum threshold. The instance will survive, but performance will suffer.

    More details about our recommendations

    The following table shows all test configurations that passed our threshold, that is, above 32,500 git hosting operations per hour and below 75% CPU utilization, with no node crashes. We sorted each configuration by descending throughput.

    Application nodes

    Database node

    NFS node

    Git throughput

    Cost per hour

    m5.4xlarge x 6

    m5.4xlarge

    m5.4xlarge

    46,833

    6.800

    m5.12xlarge x 2

    m5.4xlarge

    m5.4xlarge

    45,848

    6.792

    m5.4xlarge x 4

    m5.4xlarge

    m5.4xlarge

    45,844

    5.264

    m5.2xlarge x 8

    m5.4xlarge

    m5.4xlarge

    45,626

    5.264

    m5.4xlarge x 3

    m5.4xlarge

    m5.4xlarge

    44,378

    4.496

    m5.4xlarge x 3

    m5.2xlarge

    m5.4xlarge

    43,936

    3.784

    m5.2xlarge x 6

    m5.4xlarge

    m5.4xlarge

    43,401

    4.496

    m5.4xlarge x 3

    m5.xlarge

    m5.xlarge

    43,099

    2.840

    m5.4xlarge x 3

    m5.xlarge

    m5.4xlarge

    43,085

    3.428

    As you can see, the configuration m5.4xlarge x 4 nodes for the application doesn’t provide the highest git throughput. However, configurations with higher throughput cost more and provide only marginal performance gains.

    Recommendations for XLarge instances

    We analyzed our benchmarks and came up with the following optimal configuration:


    Best-performing configuration

    Component

    Recommendation

    Application nodes

    m5.12xlarge x 4

    Database node

    m5.2xlarge

    NFS node

    m5.2xlarge

    Performance of this configuration

    • Git throughput: 75,860 per hour

    • Cost per hour 1: $10.312

    • Average CPU utilization: 65%

    We measured performance stability in terms of how far the instance’s average CPU utilization is from the 75% threshold. As mentioned, once we hit this threshold, git operations start to slow down. The further below the instance is from 75%, the less prone it is to slow due to sudden traffic spikes.

    In our recommendations for Extra Large-sized profiles, we quoted a cost per hour for each configuration. We provide this information to help inform you about the comparative price of each configuration. This cost only calculates the price of the nodes used for the Bitbucket application, database, and NFS nodes. It does not include the cost of using other components of the application like shared homeand application load balancer.

    These figures are in USD, and were correct as of July 2019.

    Low-cost configuration

    We also found a low-cost configuration with good performance at $7.02 per hour:

    Component

    Recommendation

    Application nodes

    m5.8xlarge x 4

    Database node

    m5.2xlarge

    NFS node

    m5.xlarge

    This low-cost configuration offered a lower Git throughput of 74,275 git hosting calls per hour than the optimal configuration.However, this is still well above the defined threshold of 65,400 git hosting calls per hour. The trade-off for the price is fault tolerance. There were timeouts and errors observed on the m5.8xlarge x 3 nodes, so performance degradation may be encountered if the an application node goes down.

    The following table shows all test configurations that passed our threshold, that is, above 32,500 git hosting operations per hour and below 75% CPU utilization, with no node crashes. We sorted each configuration by descending throughput.

    Application nodes

    Database node

    NFS node

    Git throughput

    Cost per hour

    m5.12xlarge x 4

    m5.2xlarge

    m5.2xlarge

    75,860

    $ 10.31

    m5.4xlarge x 8

    m5.2xlarge

    m5.2xlarge

    73,374

    $ 7.24

    m5.8xlarge x 4

    m5.2xlarge

    m5.xlarge

    74,275

    $ 7.02

    m5.4xlarge x 6

    m5.2xlarge

    m5.2xlarge

    71,872

    $ 5.70

    m5.12xlarge x 3

    m5.2xlarge

    m5.2xlarge

    66,660

    $ 8.01

    The configuration m5.4xlarge x 8 nodes also provide good performance with high fault tolerance. However, it is our recommendation to keep the deployments within 4 nodes for better stability.

    Application node test results

    Large-sized instances

    Our first test series focused on finding out which AWS virtual machine types to use (and how many) for the application node. For these tests, we used a single m4.4xlarge node for the database and single m4.4xlarge node for the NFS server.

    Benchmarks show the best git throughput came from using m5.4xlarge (16 CPUs) and m5.12xlarge nodes (46 CPUs). You will need at least three nodes for m5.4xlarge and two nodes for m5.12xlarge.

    CPU is underutilized at 30% for the following application node configurations:

    • m5.4xlarge x 6

    • m5.12xlarge x 2

    This demonstrates both configurations are overprovisioned. It would be more cost-effective to use three or four m5.4xlarge nodes for the application.

    However, on the three-node m5.4xlarge set-up, the CPU usage would be at ~85% if one of the nodes failed. For this reason, we recommend the four-node m5.4xlarge set-up for better fault tolerance.

    XLarge-sized instances

    Our first test series focused on finding out which AWS virtual machine types to use (and how many) for the application node. For these tests, we used a single m4.2xlarge node for the database and single m4.2xlarge node for the NFS server.

    Benchmarks show the best git throughput came from using m5.12xlarge (48 CPUs) and m5.8xlarge nodes (32 CPUs). You will need four nodes for both instance types.

    We have also carried out performance testing on 2 nodes (96 CPUs), but this resulted in poor performance, not meeting the threshold. Test results showed that 2 node deploys are not suitable for xlarge load. During the 2 node tests, the time spent on kernel was very high, which was not evident on 4+ nodes.


    Database node test results

    Large-sized instances

    From the application node test series, we found using three m5.4xlarge nodes for the application yielded optimal performance (even if it wasn’t the most fault tolerant). For our second test series, we tested this configuration against the following virtual machine types for the database:

    • m4.large

    • m4.xlarge

    • m4.2xlarge

    • m4.4xlarge

    As expected, the more powerful virtual machine used, the better the performance. We saw the biggest gains in CPU utilization. Git throughput also improved, but only marginally.


    Only m5.large failed the CPU utilization threshold. All other tested virtual machine types are acceptable, although, m5.xlarge is pretty close to our CPU utilization threshold at 60%.

    XLarge-sized instances

    From the application node test series, we found using four m5.12xlarge nodes for the application yielded optimal performance. For our second test series, we tested this configuration against the following virtual machine types for the database:

    • m4.xlarge

    • m4.2xlarge

    • m4.4xlarge

    The m4.xlarge was saturated on CPU at 100%, and db.m4.4xlarge did not result in improvements in performance. For this reason, m4.2xlarge remains the recommended instance type for the extra-large load. The CPU utilisation was at ~ 40% on m4.2xlarge.


    NFS node test results

    Large-sized instances

    In previous tests (where we benchmarked different application and database node configurations), we used m5.4xlarge for the NFS node (NFS protocol v3). During each of those tests, NFS node CPU remained highly underutilized at under 18%. We ran further tests to see if we could downgrade the NFS server (and, by extension, find more cost-effective recommendations). Results showed identical git throughput, using the downsized m5.xlarge NFS node. This led to our low-cost recommendation.

    Component

    Recommendation

    Application nodes

    m5.4xlarge x 3

    Database node

    m5.xlarge

    NFS node

    m5.xlarge

    As mentioned, this recommendation costs $3.044 per hour but offers lower fault tolerance.

    Based on other test results, we recommend that, for the NFS node, use at least m5.xlarge with IOPs higher than 1500.

    XLarge-sized instances

    Benchmarks for the extra-large tests all used m5.2xlarge for the NFS instance. During each of those tests, the NFS node CPU remained highly underutilized at 25%. We ran further tests to see if we could downgrade the NFS server (and, by extension, find more cost-effective recommendations). Results showed identical git throughput, using the downsized m5.xlarge NFS node with CPU utilisation at 60%.

    This led to our low-cost recommendation.


    Component

    Recommendation

    Application nodes

    m5.8xlarge x 4

    Database node

    m5.2xlarge

    NFS node

    m5.xlarge

    Disk I/O


    Large-sized instances

    Disk I/O performance is often a limiting factor, so we also paid attention to disk utilization. Our tests revealed the disk specifications we used for the NFS node were appropriate to our traffic:

    As mentioned, we initialized this volume at the start of each test.

    Please be aware this information is only a guideline, as IOP requirements will depend on usage patterns.

    The table below shows the I/O impact of our tests on the NFS node’s disk:

    Metric

    Value

    Total throughput (Read + Write throughput)

    1,250 IOPS

    Read throughput

    700 IOPS

    Write throughput

    550 IOPS

    Read bandwidth

    100 MB/s

    Write bandwidth

    10 MB/s

    Average queue length

    1.3

    Average read latency

    1.5 ms/op

    Average write latency

    0.6 ms/op

    Disk utilization

    45%

    XLarge-sized instances

    Disk I/O performance is often a limiting factor, so we also paid attention to disk utilization. Our tests revealed the disk specifications we used for the NFS node were appropriate to our traffic:

    As mentioned, we initialized this volume at the start of each test.

    Please be aware this information is only a guideline, as IOP requirements will depend on usage patterns.

    Metric

    Value

    Total throughput (Read + Write throughput)

    9,00 IOPS

    Read throughput

    2,700 IOPS

    Write throughput

    IOPS

    Read bandwidth

    113 MB/s

    Write bandwidth

    15 MB/s

    Average queue length

    3.5

    Average read latency

    1.0 ms/op

    Average write latency

    0.70 ms/op

    Disk utilization

    80 %

    Although the average disk utilisation is high at 80%, the read and write latency was low at < 1ms/op. It is recommended that the NFS server disk to have 4500 IOPs or more to ensure that it does not become the bottleneck.


Bitbucket Server recommendations

Recommended EC2 and EBS instance sizes

This table lists the recommended EC2 and EBS configurations for operating a Bitbucket Server instance under typical workloads.

Active Users EC2 instance type EBS Optimized EBS Volume type IOPS

0 – 250

c3.large No General Purpose (SSD) N/A
250 – 500 c3.xlarge Yes General Purpose (SSD) N/A
5001000 c3.2xlarge Yes Provisioned IOPS 500 – 1000

(error) The Amazon Elastic File System (EFS) is not supported for Bitbucket's shared home directory due to poor performance of git operations.

See Amazon EC2 instance typesAmazon EBS–Optimized Instances, and Amazon EBS Volume Types for more information.



Notes

In Bitbucket instances with high hosting workload, I/O performance is often the limiting factor. It's recommended that you pay particular attention to EBS volume options, especially the following:

  • The size of an EBS volume also influences I/O performance. Larger EBS volumes generally have a larger slice of the available bandwidth and I/O operations per second (IOPS). A minimum of 100 GiB is recommended in production environments.
  • The IOPS that can be sustained by General Purpose (SSD) volumes is limited by Amazon's I/O credits. If you exhaust your I/O credit balance, your IOPS will be limited to the baseline level. You should consider using a larger General Purpose (SSD) volume or switching to a Provisioned IOPS (SSD) volume. See Amazon EBS Volume Types for more information.
  • If your EBS volumes are restored from snapshots you may see reduced performance the first time each block is accessed. See Pre-Warming Amazon EBS Volumes for more information.

The above recommendations are based on a typical workload with the specified number of active users. The resource requirements of an actual Bitbucket instance may vary markedly with a number of factors, including:

  • The number of continuous integration servers cloning or fetching from Bitbucket Server: Bitbucket Server will use more resources if you have many build servers set to clone or fetch frequently from Bitbucket Server
  • Whether continuous integration servers are using push mode notifications or polling repositories regularly to watch for updates
  • Whether continuous integration servers are set to do full clones or shallow clones
  • Whether the majority of traffic to Bitbucket Server is over HTTP, HTTPS, or SSH, and the encryption ciphers used
  • The number and size of repositories: Bitbucket Server will use more resources when you work on many very large repositories
  • The activity of your users: Bitbucket Server will use more resources if your users are actively using the Bitbucket Server web interface to browse, clone and push, and manipulate pull requests
  • The number of open pull requests: Bitbucket Server will use more resources when there are many open pull requests, especially if they all target the same branch in a large, busy repository.

See Scaling Bitbucket Server and Scaling Bitbucket Server for Continuous Integration performance for more detailed information on Bitbucket Server resource requirements.



Other supported AWS instance sizes

The following Amazon EC2 instances also meet or exceed Bitbucket Server's minimum hardware requirements. These instances provide different balances of CPU, memory, and I/O performance, and can cater for workloads that are more CPU-, memory-, or I/O-intensive than the typical. 

Instance model type Currently supported models Previous generation models that exceed minimum requirements
Compute-optimized

c4.large

c4.xlarge

c4.2xlarge

c4.4xlarge

c4.8xlarge

c5.large

c5.xlarge

c5.2xlarge

c5.4xlarge

c5.8xlarge

c3.large

c3.xlarge

c3.2xlarge

c3.4xlarge

c3.8xlarge

Storage-optimized

i3.xlarge

i3.2xlarge

i3.4xlarge

i3.8xlarge

2.xlarge

i2.2xlarge

i2.4xlarge

i2.8xlarge

General purpose

m4.large

m4.xlarge

m4.2xlarge

m4.4xlarge

m4.10xlarge

m4.16xlarge

m5.2xlarge

m5.4xlarge

m3.large

m3.xlarge

m3.2xlarge

Memory-optimized

r5.large

r5.xlarge

r5.2xlarge

r5.4xlarge

r5.8xlarge

x1.32xlarge

r3.large

r3.xlarge

r3.2xlarge

r3.4xlarge

r3.8xlarge


If your deployment uses any previous generation models that exceed minimum requirements, consider upgrading to a currently supported model. For related advice and recommendations from AWS, see Previous Generation Instances

In all AWS instance types, Bitbucket Server only supports "large" and higher instances. "Micro", "small", and "medium" sized instances do not meet Bitbucket's minimum hardware requirements and aren't recommended in production environments. 

In any instance type with available Instance Store device(s), a Bitbucket instance launched from the Bitbucket AMI will configure one Instance Store to contain Bitbucket Server's temporary files and caches. Instance Store can be faster than an EBS volume but the data doesn't persist if the instance is stopped or rebooted. Use of Instance Store can improve performance and reduce the load on EBS volumes. See Amazon EC2 Instance Store for more information. 

Bitbucket does not support D2 instances or Burstable Performance (T2) Instances



Advanced: Monitoring Bitbucket to tune instance sizing

This section is for advanced users who wish to monitor the resource consumption of their instance and use this information to guide instance sizing. If performance at scale is a concern, we recommend deploying Bitbucket Data Center with elastic scaling, which alleviates the need to worry about how a single node can accomodate fluctuating or growing load. See the AWS Quick Start guide for Bitbucket Data Center for more details.

The above recommendations provide guidance for typical workloads. The resource consumption of every Bitbucket Server instance will vary with the mix of workload. The most reliable way to determine if your Bitbucket Server instance is under- or over-provisioned in AWS is to monitor its resource usage regularly with Amazon CloudWatch. This provides statistics on the actual amount of CPU, I/O, and network resources consumed by your Bitbucket Server instance. 

The following simple example BASH script uses

to gather CPU, I/O, and network statistics and display them in a simple chart that can be used to guide your instance sizing decisions. 

Click here to expand...
#!/bin/bash
# Example AWS CloudWatch monitoring script
# Usage:
#   (1) Install gnuplot and jq (minimum version 1.4)
#   (2) Install AWS CLI (http://docs.aws.amazon.com/cli/latest/userguide/installing.html) and configure it with
#       credentials allowing cloudwatch get-metric-statistics
#   (3) Replace "xxxxxxx" in volume_ids and instance_ids below with the ID's of your real instance
#   (4) Run this script

export start_time=$(date -v-14d +%Y-%m-%dT%H:%M:%S)
export end_time=$(date +%Y-%m-%dT%H:%M:%S)
export period=1800
export volume_ids="vol-xxxxxxxx"    # REPLACE THIS WITH THE VOLUME ID OF YOUR REAL EBS VOLUME
export instance_ids="i-xxxxxxxx"    # REPLACE THIS WITH THE INSTANCE ID OF YOUR REAL EC2 INSTANCE

# Build lists of metrics and datafiles that we're interested in
ebs_metrics=""
ec2_metrics=""
cpu_datafiles=""
iops_datafiles=""
queue_datafiles=""
net_datafiles=""
for volume_id in ${volume_ids}; do
  for metric in VolumeWriteOps VolumeReadOps; do
    ebs_metrics="${ebs_metrics} ${metric}"
    iops_datafiles="${iops_datafiles} ${volume_id}-${metric}"
  done
done
for volume_id in ${volume_ids}; do
  for metric in VolumeQueueLength; do
    ebs_metrics="${ebs_metrics} ${metric}"
    queue_datafiles="${queue_datafiles} ${volume_id}-${metric}"
  done
done
for instance_id in ${instance_ids}; do
  for metric in DiskWriteOps DiskReadOps; do
    ec2_metrics="${ec2_metrics} ${metric}"
    iops_datafiles="${iops_datafiles} ${instance_id}-${metric}"
  done
done
for instance_id in ${instance_ids}; do
  for metric in CPUUtilization; do
    ec2_metrics="${ec2_metrics} ${metric}"
    cpu_datafiles="${cpu_datafiles} ${instance_id}-${metric}"
  done
done
for instance_id in ${instance_ids}; do
  for metric in NetworkIn NetworkOut; do
    ec2_metrics="${ec2_metrics} ${metric}"
    net_datafiles="${net_datafiles} ${instance_id}-${metric}"
  done
done

# Gather the metrics using AWS CLI
for volume_id in ${volume_ids}; do
  for metric in ${ebs_metrics}; do
    aws cloudwatch get-metric-statistics --metric-name ${metric} \
                                         --start-time ${start_time} \
                                         --end-time ${end_time} \
                                         --period ${period} \
                                         --namespace AWS/EBS \
                                         --statistics Sum \
                                         --dimensions Name=VolumeId,Value=${volume_id} | \
      jq -r '.Datapoints | sort_by(.Timestamp) | map(.Timestamp + " " + (.Sum | tostring)) | join("\n")' >${volume_id}-${metric}.data
  done
done

for metric in ${ec2_metrics}; do
  for instance_id in ${instance_ids}; do
    aws cloudwatch get-metric-statistics --metric-name ${metric} \
                                         --start-time ${start_time} \
                                         --end-time ${end_time} \
                                         --period ${period} \
                                         --namespace AWS/EC2 \
                                         --statistics Sum \
                                         --dimensions Name=InstanceId,Value=${instance_id} | \
      jq -r '.Datapoints | sort_by(.Timestamp) | map(.Timestamp + " " + (.Sum | tostring)) | join("\n")' >${instance_id}-${metric}.data
  done
done

cat >aws-monitor.gnuplot <<EOF
set term pngcairo font "Arial,30" size 1600,900
set title "IOPS usage"
set datafile separator whitespace
set xdata time
set timefmt "%Y-%m-%dT%H:%M:%SZ"
set grid
set ylabel "IOPS"
set xrange ["${start_time}Z":"${end_time}Z"]
set xtics "${start_time}Z",86400*2 format "%d-%b"
set output "aws-monitor-iops.png"
plot \\
EOF
for datafile in ${iops_datafiles}; do
  echo "  \"${datafile}.data\" using 1:(\$2/${period}) with lines title \"${datafile}\", \\" >>aws-monitor.gnuplot
done

cat >>aws-monitor.gnuplot <<EOF

set term pngcairo font "Arial,30" size 1600,900
set title "IO Queue Length"
set datafile separator whitespace
set xdata time
set timefmt "%Y-%m-%dT%H:%M:%SZ"
set grid
set ylabel "Queue Length"
set xrange ["${start_time}Z":"${end_time}Z"]
set xtics "${start_time}Z",86400*2 format "%d-%b"
set output "aws-monitor-queue.png"
plot \\
EOF
for datafile in ${queue_datafiles}; do
  echo "  \"${datafile}.data\" using 1:2 with lines title \"${datafile}\", \\" >>aws-monitor.gnuplot
done

cat >>aws-monitor.gnuplot <<EOF

set term pngcairo font "Arial,30" size 1600,900
set title "CPU Utilization"
set datafile separator whitespace
set xdata time
set timefmt "%Y-%m-%dT%H:%M:%SZ"
set grid
set ylabel "%"
set xrange ["${start_time}Z":"${end_time}Z"]
set xtics "${start_time}Z",86400*2 format "%d-%b"
set output "aws-monitor-cpu.png"
plot \\
EOF
for datafile in $cpu_datafiles; do
  echo "  \"${datafile}.data\" using 1:2 with lines title \"${datafile}\", \\" >>aws-monitor.gnuplot
done

cat >>aws-monitor.gnuplot <<EOF

set term pngcairo font "Arial,30" size 1600,900
set title "Network traffic"
set datafile separator whitespace
set xdata time
set timefmt "%Y-%m-%dT%H:%M:%SZ"
set grid
set ylabel "MBytes/s"
set xrange ["${start_time}Z":"${end_time}Z"]
set xtics "${start_time}Z",86400*2 format "%d-%b"
set output "aws-monitor-net.png"
plot \\
EOF
for datafile in $net_datafiles; do
  echo "  \"${datafile}.data\" using 1:(\$2/${period}/1000000) with lines title \"${datafile}\", \\" >>aws-monitor.gnuplot
done

gnuplot <aws-monitor.gnuplot

When run on a typical Bitbucket Server instance, this script produces charts such as the following:

You can use the information in charts such as this to decide whether CPU, network, or I/O resources are over- or under-provisioned in your instance. 

If your instance is frequently saturating the maximum available CPU (taking into account the number of cores in your instance size), then this may indicate you need an EC2 instance with a larger CPU count. (Note that the CPU utilization reported by Amazon CloudWatch for smaller EC2 instance sizes may be influenced to some extent by the "noisy neighbor" phenomenon, if other tenants of the Amazon environment consume CPU cycles from the same physical hardware that your instance is running on.) 

If your instance is frequently exceeding the IOPS available to your EBS volume and/or is frequently queuing I/O requests, then this may indicate you need to upgrade to an EBS optimized instance and/or increase the Provisioned IOPS on your EBS volume. See EBS Volume Types for more information. 

If your instance is frequently limited by network traffic, then this may indicate you need to choose an EC2 instance with a larger available slice of network bandwidth.

Last modified on Jun 12, 2019

Was this helpful?

Yes
No
Provide feedback about this article
Powered by Confluence and Scroll Viewport.