OpenSearch hardware recommendations for Confluence

Still need help?

The Atlassian Community is here for you.

Ask the community

OpenSearch needs a dedicated infrastructure to function within a standard Confluence Data Center setup. Knowing how to deploy OpenSearch for your indexing requirements will help you plan and streamline operations tailored to support your business needs.

Our hardware recommendations include performance test insights to help you determine the optimal size and number of application and database nodes. These recommendations are invaluable when planning a suitable environment or assessing the adequacy of your current instance based on content volume and traffic.

Identifying the most effective and efficient infrastructure for a growing instance isn't always straightforward, for example, augmenting application nodes may not necessarily enhance performance (in fact, it could have the opposite effect). 

To benefit from these test insights we suggest that you:

  • determine your instance size profile 

  • review the recommendations below

  • monitor your instance for bottlenecks

OpenSearch and Confluence 

The table below outlines a typical setup for Confluence and how you could include OpenSearch in your system. It includes pricing, filesystem and database details.

RoleAWS serviceInstance typeNodesvCPURAM GiBPrice/node*PriceTotal price*
ConfluenceEC2m6i.2xlarge3832$0.384/hour$1.152/hour$2,917/month
NFSEC2m6i.large128$0.096/hour$0.096/hour
DatabaseRDS (PostgreSQL)db.r5.xlargeSingle AZ432$0.5/hour$0.5/hour
OpenSearchData noder6g.large.search3216$0.167/hour$0.501/hour$638/month
Master nodem6g.large.search328$0.128/hour$0.384/hour

*Prices are from 2 July 2024 based on US East (Ohio), taken from Amazon's OpenSearch pricing guide and Amazon EC2 On-Demand Pricing

Considerations for OpenSearch recommendations

When reviewing our hardware recommendations there are a few things to keep in mind.

Performance depends on a number of factors such as third party apps, large repositories, data, traffic, concurrency, customizations, or instance type. So our test results might not be fully replicable in your environment. We advise checking our test methodology to understand how the results were achieved. 

Note that the cost per hour that we provide does not include the cost of using other components of the application, like shared home and application load balancer.

We recommend a minimum of three nodes to avoid potential OpenSearch issues. If you have three dedicated master nodes, we still recommend a minimum of two data nodes for replication.

Assess the test details below to see what the throughput was for the recommended instance configuration. It might give you extra data to make an informed decision between the best-performing and the most cost effective option.

For more details, refer to the AWS Documentation on Sizing Amazon OpenSearch Service domains.

Testing approach

We ran all of our tests in AWS environments. This allowed us to easily define and automate many tests, giving us a large and fairly reliable sample of results. 

Each part of our test infrastructure is a standard AWS component available to all AWS users. This means you can easily deploy our recommended configurations. 

Since we used standard AWS components you can look up their specifications in the AWS documentation. This lets you find equivalent components and configurations if your organization prefers to use a different cloud platform or bespoke clustered solution. 

Considerations when using our benchmarks

To gather a large sample of benchmarks for analysis, we designed tests that could be easily set up and replicated. As such, when referencing our benchmarks and recommendations for your infrastructure plans, consider the following:

  • We didn't install apps on our test instances, as our focus was finding the right configurations for the core product. When designing your infrastructure, you need to account for the performance impact of apps you want to install.

  • We used PostgreSQL with default AWS RDS settings across all our tests. This allowed us to get consistent results with minimal setup and tuning.

  • Our test environment used dedicated AWS infrastructure hosted on the same subnet. This helped minimize network latency. 

The dataset

The following table shows the dataset and traffic we used on our performance tests for Confluence.

Total spaces5,004
Site spaces5,004
Personal spaces0
Content (all versions)9,543,645
Content (current versions)9,543,618
Local users5,005
Local groups28

The following table shows the content index sizes. 

Search platformStorageSize
LuceneLocal index (per Confluence node)36 GiB
OpenSearchPrimary store (cluster wide, excluding replicas)63 GiB

Performance testing: results and analysis

Performance tests for CQL search

We conducted performance testing using the GitHub DC app performance toolkit, generating 20,000 actions per hour from 200 concurrent users on a Confluence instance configured with OpenSearch, and on another Confluence instance configured with Lucene to serve as our baseline. The simulated actions consisted of 4% searches, and the remaining 96% of other actions such as viewing/editing pages, blogs, comments, attachments, and viewing the dashboard.

Search platformMedian response time (lower is better)
Lucene (baseline)2.34 seconds
OpenSearch 0.66 seconds

Performance tests for full reindexing

We conducted performance testing by manually triggering a full re-index on our instance and found that the re-indexing performance was better for Confluence using OpenSearch. 

Search platformDuration (lower is better)
Lucene4 hours 59 minutes
OpenSearch4 hours 36 minutes




Last modified on Jul 30, 2024

Was this helpful?

Yes
No
Provide feedback about this article
Powered by Confluence and Scroll Viewport.