Monitoring AWS OpenSearch Service
Metrics exposed by AWS
AWS OpenSearch Service publishes a comprehensive set of metrics to CloudWatch, covering cluster health, resource utilization, and search performance. Commonly monitored metrics include:
ClusterStatus.green/yellow/red: Indicates overall cluster health.FreeStorageSpace: Available disk space on data nodes.CPUUtilization: CPU usage across nodes.JVMMemoryPressure: JVM heap usage, which can impact performance.MasterCPUUtilization: CPU usage on master nodes.SearchLatencyandIndexingLatency: Performance of search and indexing operations.AutomatedSnapshotFailure: Status of automated snapshots.
Explore full list of available metrics and their descriptions
Recommended CloudWatch alarms
For most OpenSearch monitoring needs, we recommend following the Recommended CloudWatch alarms for Amazon OpenSearch Service guide. This guidance covers essential metrics such as cluster health, storage, CPU, JVM memory pressure, and node availability. These standard alarms provide a strong foundation for maintaining the health and performance of your OpenSearch cluster.
In addition to the AWS recommendations, certain advanced metrics may be especially relevant for Jira workloads or specialized use cases. Consider setting alarms on the following metrics:
CurrentPointInTime: Trigger an alarm if the maximum value is greater than 10 for 5 minutes (2 consecutive periods).AvgPointInTimeAliveTime: Trigger an alarm if the average value exceeds 5 minutes for 5 minutes (2 consecutive periods).ScrollCurrent: Trigger an alarm if the maximum value is greater than 5 for 5 minutes (2 consecutive periods). This is particularly relevant for operations such asdelete_by_query.
These additional alarms can help you detect issues related to point-in-time (PIT) searches and scroll contexts, which are commonly used in Jira and might not be covered by AWS’s default recommendations. The thresholds and durations above are suggested defaults, adjust them to fit your domain, workload patterns, and operational SLOs.
Setting up alarms
You can set up CloudWatch alarms and dashboards using the AWS Console, AWS CLI, or automation tools such as CloudFormation. For standard alarms, you can use the OpenSearch CloudWatch Alarms guide, which covers the metrics recommended in the AWS documentation.
For Jira-specific custom alarms, such as CurrentPointInTime, AvgPointInTimeAliveTime, and ScrollCurrent, you'll need to define these alarms yourself. Below are example CloudFormation YAML snippets for these custom alarms.
We recommend integrating alarms with SNS or other notification services for timely alerts. Note that CloudWatch dashboards and alarms might incur additional AWS charges.
# Alarm for CurrentPointInTime
CurrentPointInTimeAlarm:
Type: AWS::CloudWatch::Alarm
Properties:
AlarmName: "CurrentPointInTime-High"
MetricName: "CurrentPointInTime"
Namespace: "AWS/ES"
Statistic: "Maximum"
Period: 300
EvaluationPeriods: 2
Threshold: 10
ComparisonOperator: "GreaterThanThreshold"
Dimensions:
- Name: DomainName
Value: !Ref OpenSearchDomainName
AlarmActions:
- !Ref AlarmNotificationTopic
# Alarm for AvgPointInTimeAliveTime
AvgPointInTimeAliveTimeAlarm:
Type: AWS::CloudWatch::Alarm
Properties:
AlarmName: "AvgPointInTimeAliveTime-High"
MetricName: "AvgPointInTimeAliveTime"
Namespace: "AWS/ES"
Statistic: "Average"
Period: 300
EvaluationPeriods: 2
Threshold: 300
ComparisonOperator: "GreaterThanThreshold"
Dimensions:
- Name: DomainName
Value: !Ref OpenSearchDomainName
AlarmActions:
- !Ref AlarmNotificationTopic
# Alarm for ScrollCurrent
ScrollCurrentAlarm:
Type: AWS::CloudWatch::Alarm
Properties:
AlarmName: "ScrollCurrent-High"
MetricName: "ScrollCurrent"
Namespace: "AWS/ES"
Statistic: "Maximum"
Period: 300
EvaluationPeriods: 2
Threshold: 5
ComparisonOperator: "GreaterThanThreshold"
Dimensions:
- Name: DomainName
Value: !Ref OpenSearchDomainName
AlarmActions:
- !Ref AlarmNotificationTopic