OpenSearch hardware recommendations for Confluence
OpenSearch needs a dedicated infrastructure to function within a standard Confluence Data Center setup. Knowing how to deploy OpenSearch for your indexing requirements will help you plan and streamline operations tailored to support your business needs.
Our hardware recommendations include performance test insights to help you determine the optimal size and number of application and database nodes. These recommendations are invaluable when planning a suitable environment or assessing the adequacy of your current instance based on content volume and traffic.
Identifying the most effective and efficient infrastructure for a growing instance isn't always straightforward, for example, augmenting application nodes may not necessarily enhance performance (in fact, it could have the opposite effect).
To benefit from these test insights we suggest that you:
determine your instance size profile
review the recommendations below
monitor your instance for bottlenecks
OpenSearch and Confluence
The table below outlines a typical setup for Confluence and how you could include OpenSearch in your system. It includes pricing, filesystem and database details.
Role | AWS service | Instance type | Nodes | vCPU | RAM GiB | Price/node* | Price | Total price* |
---|---|---|---|---|---|---|---|---|
Confluence | EC2 | m6i.2xlarge | 3 | 8 | 32 | $0.384/hour | $1.152/hour | $2,917/month |
NFS | EC2 | m6i.large | 1 | 2 | 8 | $0.096/hour | $0.096/hour | |
Database | RDS (PostgreSQL) | db.r5.xlarge | Single AZ | 4 | 32 | $0.5/hour | $0.5/hour | |
OpenSearch | Data node | r6g.large.search | 3 | 2 | 16 | $0.167/hour | $0.501/hour | $638/month |
Master node | m6g.large.search | 3 | 2 | 8 | $0.128/hour | $0.384/hour |
*Prices are from 2 July 2024 based on US East (Ohio), taken from Amazon's OpenSearch pricing guide and Amazon EC2 On-Demand Pricing
Considerations for OpenSearch recommendations
When reviewing our hardware recommendations there are a few things to keep in mind.
Performance depends on a number of factors such as third party apps, large repositories, data, traffic, concurrency, customizations, or instance type. So our test results might not be fully replicable in your environment. We advise checking our test methodology to understand how the results were achieved.
Note that the cost per hour that we provide does not include the cost of using other components of the application, like shared home and application load balancer.
We recommend a minimum of three nodes to avoid potential OpenSearch issues. If you have three dedicated master nodes, we still recommend a minimum of two data nodes for replication.
Assess the test details below to see what the throughput was for the recommended instance configuration. It might give you extra data to make an informed decision between the best-performing and the most cost effective option.
For more details, refer to the AWS Documentation on Sizing Amazon OpenSearch Service domains.
Testing approach
We ran all of our tests in AWS environments. This allowed us to easily define and automate many tests, giving us a large and fairly reliable sample of results.
Each part of our test infrastructure is a standard AWS component available to all AWS users. This means you can easily deploy our recommended configurations.
Since we used standard AWS components you can look up their specifications in the AWS documentation. This lets you find equivalent components and configurations if your organization prefers to use a different cloud platform or bespoke clustered solution.
Considerations when using our benchmarks
To gather a large sample of benchmarks for analysis, we designed tests that could be easily set up and replicated. As such, when referencing our benchmarks and recommendations for your infrastructure plans, consider the following:
We didn't install apps on our test instances, as our focus was finding the right configurations for the core product. When designing your infrastructure, you need to account for the performance impact of apps you want to install.
We used PostgreSQL with default AWS RDS settings across all our tests. This allowed us to get consistent results with minimal setup and tuning.
Our test environment used dedicated AWS infrastructure hosted on the same subnet. This helped minimize network latency.
The dataset
The following table shows the dataset and traffic we used on our performance tests for Confluence.
Total spaces | 5,004 |
---|---|
Site spaces | 5,004 |
Personal spaces | 0 |
Content (all versions) | 9,543,645 |
Content (current versions) | 9,543,618 |
Local users | 5,005 |
Local groups | 28 |
The following table shows the content index sizes.
Search platform | Storage | Size |
---|---|---|
Lucene | Local index (per Confluence node) | 36 GiB |
OpenSearch | Primary store (cluster wide, excluding replicas) | 63 GiB |
Performance testing: results and analysis
Performance tests for CQL search
We conducted performance testing using the GitHub DC app performance toolkit, generating 20,000 actions per hour from 200 concurrent users on a Confluence instance configured with OpenSearch, and on another Confluence instance configured with Lucene to serve as our baseline. The simulated actions consisted of 4% searches, and the remaining 96% of other actions such as viewing/editing pages, blogs, comments, attachments, and viewing the dashboard.
Search platform | Median response time (lower is better) |
---|---|
Lucene (baseline) | 2.34 seconds |
OpenSearch | 0.66 seconds |
Performance tests for full reindexing
We conducted performance testing by manually triggering a full re-index on our instance and found that the re-indexing performance was better for Confluence using OpenSearch.
Search platform | Duration (lower is better) |
---|---|
Lucene | 4 hours 59 minutes |
OpenSearch | 4 hours 36 minutes |