Infrastructure recommendations for enterprise Confluence instances on AWS
The AWS Quick Start template as a method of deployment is no longer supported by Atlassian. You can still use the template, but we won't maintain or update it.
We recommend deploying your Data Center products on a Kubernetes cluster using our Helm charts for a more efficient and robust infrastructure and operational setup. Learn more about deploying on Kubernetes.
AWS now recommends switching launch configurations, which our AWS Quick Start template uses, to launch templates. We won’t do this switch, however, as we’ve ended our support for the AWS Quick Start template. This means you're no longer able to create launch configurations using this template.
In Confluence Data Center load profiles, we presented simple guidelines for finding out if your instance was Small, Medium, Large, or XLarge. We based these size profiles on different Server and Data Center case studies, covering instances of varying infrastructure sizes and configurations. Knowing your load profile is useful for planning for your company's growth, looking for inflated metrics, or simply reviewing your infrastructure's suitability.
Recommendations are based on older versions of Confluence
The recommendations on this page are based on tests conducted on older versions of Confluence. We tested:
- XLarge data set on Confluence Data Center 6.13 and 6.15
- Large data set on Confluence Data Center 6.13 and 6.14
- Medium data set on Confluence Data Center 6.13
As your load grows closer to Large or XLarge, you should routinely evaluate your infrastructure. Once your environment starts to experience performance or stability issues, consider migrating to a clustered (or cluster-ready) infrastructure. When you do, keep in mind that it may not be always clear how to do that effectively – for example, adding more application nodes to a growing Medium-sized instance doesn't always improve performance (in fact, the opposite might happen).
To help you plan your infrastructure set-up or growth, we ran a series of performance tests on typical Medium, Large, and XLarge instances. We designed these tests to get useful, data-driven recommendations for your clustered deployment's application and database nodes. These recommendations can help you plan a suitable clustered environment, one that is adequate for the size of your projected content and traffic.
Executive summary
Medium
Recommendation | Application nodes | Database node | Cost per hour 1 | Apdex (6.13) |
---|---|---|---|---|
Performance | c5.xlarge x 2 | m4.large | $0.522 | 0.929 |
Stability | c5.large x 4 | m4.large | $0.522 | 0.905 |
The Performance option offers the best Apdex among all the configurations we tested. It can maintain an Apdex above 0.9 even when it loses one node.
The Stability option offers better fault tolerance at the same cost, but there is a slight drop in performance.
Confluence performed well in all tests, demonstrating Apdex of above 0.90.
Large
Recommendation | Application nodes | Database node | Cost per hour 1 | Apdex (per Confluence version) | |
---|---|---|---|---|---|
6.13 | 6.14 | ||||
Performance | c5.4xlarge x 2 | m4.2xlarge | 2.09 | 0.852 | 0.874 |
Stability | c5.2xlarge x 4 | m4.xlarge | 1.72 | 0.817 | 0.837 |
Low cost | c5.2xlarge x 3 | m4.xlarge | 1.38 | 0.820 | 0.834 |
The Performance option offered the best Apdex among all the configurations we tested. It can maintain an Apdex above 0.8 even when it loses one node.
The Stability and Low cost options offer a good balance between price, fault tolerance, and performance. You'll notice that they both use the same virtual machine types – the Stability option just has an additional application node. The Stability option can afford to lose more nodes before going completely offline, but the Low cost option costs less. Any performance difference between the two is negligible.
XLarge
Configuration | Application nodes | Database node | Cost per hour1 | Apdex (per Confluence version) | |
---|---|---|---|---|---|
6.13 | 6.15 | ||||
Stability | c5.4xlarge x 4 | m4.2xlarge | 3.45 | 0.810 | 0.826 |
Low cost | c5.4xlarge x 3 | m4.2xlarge | 2.77 | 0.811 | 0.825 |
The Stability configuration can maintain acceptable performance (that is, Apdex above 0.8) even if it lost one application node. At four application nodes, it is more fault tolerant overall than the Low cost configuration.
Both Stability and Low cost configurations are identical except for the number of nodes. Their Apdex scores don't differ much either. The Low cost configuration is fairly fault-tolerant, in that it can afford to lose 3 nodes before the service goes offline. However, our tests show that if the Low cost configuration loses one node, the Apdex dips below 0.8.
Important note
Performance results depend on many factors such as 3rd party apps, data, traffic or the instance type. Hence, the performance we achieved might not be replicable to your environment. Make sure you read through the methodology of our test to learn the details of these recommendations.Approach
We ran all of our tests in AWS environments. This allowed us to easily define and automate many tests, giving us a large and fairly reliable sample of test results.
Each part of our test infrastructure is a standard AWS component available to all AWS users. This means you can easily deploy our recommended configurations.
Since we used standard AWS components, you can look up their specifications in the AWS documentation. This lets you find equivalent components and configurations if your organization prefers to use a different cloud platform or bespoke clustered solution.
Some things to consider
To gather a large sample of benchmarks for analysis, we designed tests that could be easily set up and replicated. As such, when referencing our benchmarks and recommendations for your infrastructure plans, consider the following:
- We didn't install apps on our test instances, as our focus was finding the right configurations for the core product. When designing your infrastructure, you need to account for the performance impact of apps you want to install.
- We used Postgresql 9.4.15 with default AWS RDS settings across all our tests. This allowed us to get consistent results with minimal setup and tuning.
- Our test environment used dedicated AWS infrastructure hosted on the same subnet. This helped minimize network latency.
Analytics
We enabled Analytics on each test instance to collect usage data. For more information, see Data Collection Policy.
Disk I/O considerations
Our data set featured a limited amount of attachments, resulting in traffic that was mostly composed of write operations (compared to a typical production instance). On average, this traffic produced 1kbps reads and 2,500 kbps writes on our shared home. This roughly equates to an average IOPS of 0.15 and 200 for reads and writes, respectively.
While we didn't set out to test disk I/O specifically, these results suggest that our shared home configuration (that is, a single m4.large node running NFS on gp2) was sufficient for our load. The disk load was stable throughout all our tests, and did not stress the NFS server.
Bear in mind, however, that the synthetic traffic we used here was mostly write traffic. This is not representative of typical production instances, which feature a higher proportion of reads.
Methodology
We ran three separate test phases: one for Medium, one for Large, and another for XLarge. Each phase involved testing a specific volume of traffic to the same Confluence data set, but on a freshly provisioned AWS environment. Each environment was exactly like the last, except for the configuration of application and database nodes.
Our objective was to benchmark different configurations for Medium, Large, and XLarge size profiles. Specifically, we analyzed how different AWS virtual machine types affected the performance of the instance.
Benchmark
For all tests in Medium, Large, and XLarge size profiles, we used an Apdex of 0.8 as our threshold for acceptable performance. This Apdex assumes that a 1-second response time is our Tolerating threshold, while anything above 4 seconds is our Frustrated threshold.
By comparison, we target an Apdex of 0.7 for our own internal production Confluence Data Center instances (as we discussed in Confluence Data Center sample deployment and monitoring strategy). However, that 0.7 Apdex takes into account the performance impact of apps on those instances. We don't have any apps installed on our test instances, so we adjusted the target Apdex for our tests to 0.8.
Architecture
Recommendations for Medium-sized instances
The following table shows the data set and traffic we used on our performance tests for Medium-size instances:
Load profile metric | Value |
---|---|
Total Spaces | 1,700 |
Site Spaces | 1600 |
Content (All Versions) | 1,520,000 |
Local Users | 9,800 |
Traffic (HTTP requests per hour) | 180,000 |
We analyzed the benchmarks and configurations from our Medium testing phase and came up with the following recommendations:
Recommendation | Application nodes | Database node | Apdex (6.13) | |
---|---|---|---|---|
Performance | c5.xlarge x 2 | m4.large | 0.929 | $0.522 |
Stability | c5.large x 4 | m4.large | 0.905 | $0.522 |
The Performance option offers the best Apdex among all the configurations we tested. It can maintain an Apdex above 0.9 even when it loses one node.
The Stability option offers better fault tolerance at the same cost, but there is a slight drop in performance.
Confluence performed well in all tests, demonstrating Apdex of above 0.90.
Cost per hour
1 In our recommendations for Medium size profiles, we quoted a cost per hour for each configuration. We provide this information help inform you about the comparative price of each configuration. This cost only calculates the price of the nodes used for the Confluence application and database. It does not include the cost of using other components like Synchrony, shared home, or application load balancer.
These figures are in USD for deployments in US-EAST, and were correct as of April 2019.
Medium tests: results and analysis
We ran one type of tests to determine optimal configurations for the Confluence application node. The tests sought to find out which AWS virtual machine types to use (and how many) for the application node. For these tests, we used a single m4.xlarge node for the database.
Recommendations for Large-sized instances
The following table shows the data set and traffic we used on our performance tests for Large-size instances:
Load profile metric | Value |
---|---|
Total Spaces | 6,550 |
Content (All Versions) | 16,000,000 |
Local Users | 12,300 |
Traffic (HTTP requests per hour) | 498,000 |
We analyzed the benchmarks and configurations from our Large testing phase and came up with the following recommendations:
Recommendation | Application nodes | Database node | Cost per hour 1 | Apdex (per Confluence version) | |
---|---|---|---|---|---|
6.13 | 6.14 | ||||
Performance | c5.4xlarge x 2 | m4.2xlarge | 2.09 | 0.852 | 0.874 |
Stability | c5.2xlarge x 4 | m4.xlarge | 1.72 | 0.817 | 0.837 |
Low cost | c5.2xlarge x 3 | m4.xlarge | 1.38 | 0.820 | 0.834 |
The Performance option offered the best Apdex among all the configurations we tested. It can maintain an Apdex above 0.8 even when it loses one node.
The Stability and Low cost options offer a good balance between price, fault tolerance, and performance. You'll notice that they both use the same virtual machine types – the Stability option just has an additional application node. The Stability option can afford to lose more nodes before going completely offline, but the Low cost option costs less. Any performance difference between the two is negligible.
Cost per hour
1 In our recommendations for both Large and XLarge size profiles, we quoted a cost per hour for each configuration. We provide this information help inform you about the comparative price of each configuration. This cost only calculates the price of the nodes used for the Confluence application and database. It does not include the cost of using other components like Synchrony, shared home, or application load balancer.
These figures are in USD for deployments in US-EAST, and were correct as of April 2019.
Large tests: results and analysis
We ran two types of tests to get these recommendations: one to determine optimal configurations for the Confluence application node and another for the database node:
- Our first test type sought to find out which AWS virtual machine types to use (and how many) for the application node. For these tests, we used a single m4.xlarge node for the database.
- Our second test series benchmarked different virtual machine types for the database. Here, we tested different virtual machine types against two application node configurations: two c5.4xlarge and four c5.2xlarge. These application node configurations yielded the highest Apdex from the previous test. In each test from this series, we only used one database node as well.
Recommendations for XLarge-sized instances
The following table shows the data set and traffic we used on our performance tests for XLarge-size instances:
Load profile metric | Value |
---|---|
Total Spaces | 10,500 |
Content (All Versions) | 34,900,000 |
Local Users | 102,000 |
Traffic (HTTP requests per hour) | 1,000,000 |
We analyzed the benchmarks and configurations from our XLarge testing phase and came up with the following recommendations:
Configuration | Application nodes | Database node | Cost per hour1 | Apdex (per Confluence version) | |
---|---|---|---|---|---|
6.13 | 6.15 | ||||
Stability | c5.4xlarge x 4 | m4.2xlarge | 3.45 | 0.810 | 0.826 |
Low cost | c5.4xlarge x 3 | m4.2xlarge | 2.77 | 0.811 | 0.825 |
The Stability configuration can maintain acceptable performance (that is, Apdex above 0.8) even if it lost one application node. At four application nodes, it is more fault tolerant overall than the Low cost configuration.
Both Stability and Low cost configurations are identical except for the number of nodes. Their Apdex scores don't differ much either. The Low cost configuration is fairly fault-tolerant, in that it can afford to lose 3 nodes before the service goes offline. However, our tests show that if the Low cost configuration loses one node, the Apdex dips below 0.8.
XLarge tests: results and analysis
For our XLarge tests, we first tested different virtual machine types for the application node against a single m4.2xlarge node for the database in each. We ran these tests on Confluence 6.15, which was the latest available version at the time.
After checking which configurations produced the best performance, we re-tested them on Confluence 6.13.
Practical examples
If you'd like to see practical applications of these recommendations, check out these resources:
- How we make sure Confluence Data Center stays enterprise-ready describes the test environment we use to benchmark each Confluence release.
- Confluence Data Center sample deployment and monitoring strategy discusses our strategies for monitoring the performance of one of our internal Confluence Data Center instances. This is a real-life instance used by Atlassians worldwide, and also has apps installed.
Both are Large-sized Confluence Data Center instances hosted on AWS. They also use the virtual machine types from the node configuration with the Lowest cost. However, we use four application nodes for even better fault tolerance. In our production Confluence instance, this configuration still gives our users acceptable performance – even with our overall load and installed apps.
We're here to help
Over time, we may change our recommendations depending on new tests, insights, or improvements in Confluence Data Center. Follow this page in case that happens. Contact Atlassian Advisory Services for more guidance on choosing the right configuration for your Data Center instance.
Our Premier Support team performs health checks by meticulously analyzing your application and logs to ensure that your infrastructure configuration is suitable for your Data Center application. If the health check process reveals any performance gaps, Premier Support will recommend possible changes.