Monitoring AWS OpenSearch Service in Confluence
AWS OpenSearch Service offers built-in integration with AWS CloudWatch, making it easy to monitor the health and performance of your managed clusters. CloudWatch automatically collects a wide range of metrics from your OpenSearch domains. You can use these metrics to create dashboards and set up alarms that match your operational needs.
Metrics exposed by AWS
AWS OpenSearch Service publishes a comprehensive set of metrics to CloudWatch, covering cluster health, resource utilization, and search performance. Commonly monitored metrics include:
ClusterStatus.green/yellow/red: Indicates overall cluster health.FreeStorageSpace: Available disk space on data nodes.CPUUtilization: CPU usage across nodes.JVMMemoryPressure: JVM heap usage, which can impact performance.MasterCPUUtilization: CPU usage on master nodes.SearchLatencyandIndexingLatency: Performance of search and indexing operations.AutomatedSnapshotFailure: Status of automated snapshots.
Explore full list of available metrics and their descriptions
Recommended CloudWatch alarms
For most OpenSearch monitoring needs, we recommend following the Recommended CloudWatch alarms for Amazon OpenSearch Service guide. This guidance covers essential metrics such as cluster health, storage, CPU, JVM memory pressure, and node availability. These standard alarms provide a strong foundation for maintaining the health and performance of your OpenSearch cluster.
In addition to the AWS recommendations, certain advanced metrics may be especially relevant for Confluence workloads or specialized use cases. Consider setting alarms on the following metrics:
CurrentPointInTime: Trigger an alarm if the maximum value is greater than 10 for 5 minutes (2 consecutive periods).AvgPointInTimeAliveTime: Trigger an alarm if the average value exceeds 5 minutes for 5 minutes (2 consecutive periods).ScrollCurrent: Trigger an alarm if the maximum value is greater than 5 for 5 minutes (2 consecutive periods). This is particularly relevant for operations such asdelete_by_query.
These additional alarms can help you detect issues related to point-in-time (PIT) searches and scroll contexts, which are commonly used in Confluence and might not be covered by AWS’s default recommendations. The thresholds and durations above are suggested defaults, adjust them to fit your domain, workload patterns, and operational SLOs.
Setting up alarms
You can set up CloudWatch alarms and dashboards using the AWS Console, AWS CLI, or automation tools such as CloudFormation. For standard alarms, you can use the OpenSearch CloudWatch Alarms guide, which covers the metrics recommended in the AWS documentation.
For Confluence-specific custom alarms, such as CurrentPointInTime, AvgPointInTimeAliveTime, and ScrollCurrent, you'll need to define these alarms yourself. Below are example CloudFormation YAML snippets for these custom alarms.
We recommend integrating alarms with SNS or other notification services for timely alerts. Note that CloudWatch dashboards and alarms might incur additional AWS charges.