How to monitor the Synchrony cluster health
Platform Notice: Data Center Only - This article only applies to Atlassian products on the data center platform.
Synchrony self-managed mode has some limitations in terms of system properties that can be loaded. This is covered in the following feature request:
In that scenario, we need to leverage different solutions in order to monitor the Synchrony cluster health. We'll cover some of them in this article.
Hazelcast health monitor
Both Confluence and Synchrony clusters use Hazelcast, which includes the following feature:
This means some extra diagnostics will be printed on the logs if one of the following conditions is met:
- Memory usage > 70%
- CPU usage > 70%
The thresholds can be configured using system properties covered in the document. Alternatively, you can set the log level to NOISY to have the message printed every 20s (interval also configurable):
- Edit the file <Confluence-local-home>/synchrony-args.properties
Add the following line at the bottom:
- Save and access the node you just modified
- Restart Synchrony on the Collaborative Editing management page
- Repeat on all nodes
If the default threshold is met or if you enabled NOISY log level, the following messages are printed on the atlassian-synchrony.log file:
INFO [hz._hzInstance_1.HealthMonitor] [hazelcast.internal.diagnostics.HealthMonitor] [126.96.36.199]:5701 [confluence-Synchrony] [3.11.4] processors=8, physical.memory.total=0, physical.memory.free=0, swap.space.total=0, swap.space.free=0, heap.memory.used=1.6G, heap.memory.free=449.5M, heap.memory.total=2.0G, heap.memory.max=2.0G, heap.memory.used/total=78.03%, heap.memory.used/max=78.03%, minor.gc.count=264, minor.gc.time=20332ms, major.gc.count=0, major.gc.time=0ms, load.process=0.00%, load.system=0.00%, load.systemAverage=13.72, thread.count=109, thread.peakCount=233, cluster.timeDiff=0, event.q.size=0, executor.q.async.size=0, executor.q.client.size=0, executor.q.client.query.size=0, executor.q.client.blocking.size=0, executor.q.query.size=0, executor.q.scheduled.size=0, executor.q.io.size=0, executor.q.system.size=0, executor.q.operations.size=0, executor.q.priorityOperation.size=0, operations.completed.count=6869, executor.q.mapLoad.size=0, executor.q.mapLoadAllKeys.size=0, executor.q.cluster.size=0, executor.q.response.size=0, operations.running.count=0, operations.pending.invocations.percentage=0.00%, operations.pending.invocations.count=0, proxy.count=0, clientEndpoint.count=0, connection.active.count=2, client.connection.count=0, connection.count=2
This includes lots of useful data like memory allocated, memory used, GC count, and GC times on the message. To find them, search for:
- hazelcast.internal.diagnostics.HealthMonitor if running Confluence 7+
- heap.memory.used or any other of the metrics printed if running Confluence 6
Java flight recording
Due to the same limitation that prevents enabling GC logging, it's not possible to create a recording with system properties. Instead, use jcmd, which means a JDK is needed for this purpose. Example:
$ jcmd <Synchrony-pid> JFR.start $ jcmd <Synchrony-pid> JFR.dump filename=recording.jfr
Running just jcmd on the command line lists all Java processes running on the server, which is useful to find the Synchrony one.
With JDK mission control, you can then review the recording:
If you are using Oracle JDK, Java Flight Recorder requires a commercial license for use in production.