How to monitor the Synchrony cluster health
Platform Notice: Data Center Only - This article only applies to Atlassian products on the data center platform.
Synchrony self-managed mode has some limitations in terms of system properties that can be loaded. This is covered in the following feature request:
In that scenario, we need to leverage different solutions in order to monitor the Synchrony cluster health. We'll cover some of them in this article.
Option 1. Hazelcast health monitor
Both Confluence and Synchrony clusters use Hazelcast, which includes the following feature:
This means some extra diagnostics will be printed on the logs if one of the following conditions is met:
- Memory usage > 70%
- CPU usage > 70%
The thresholds can be configured using system properties covered in the document. Alternatively, you can set the log level to NOISY to have the message printed every 20s (interval also configurable):
- Edit the file <Confluence-local-home>/synchrony-args.properties
Add the following line at the bottom:
- Save and access the node you just modified
- Restart Synchrony on the Collaborative Editing management page
- Repeat on all nodes
If the default threshold is met or if you enabled NOISY log level, the following messages are printed on the atlassian-synchrony.log file:
INFO [hz._hzInstance_1.HealthMonitor] [hazelcast.internal.diagnostics.HealthMonitor] [126.96.36.199]:5701 [confluence-Synchrony] [3.11.4] processors=8, physical.memory.total=0, physical.memory.free=0, swap.space.total=0, swap.space.free=0, heap.memory.used=1.6G, heap.memory.free=449.5M, heap.memory.total=2.0G, heap.memory.max=2.0G, heap.memory.used/total=78.03%, heap.memory.used/max=78.03%, minor.gc.count=264, minor.gc.time=20332ms, major.gc.count=0, major.gc.time=0ms, load.process=0.00%, load.system=0.00%, load.systemAverage=13.72, thread.count=109, thread.peakCount=233, cluster.timeDiff=0, event.q.size=0, executor.q.async.size=0, executor.q.client.size=0, executor.q.client.query.size=0, executor.q.client.blocking.size=0, executor.q.query.size=0, executor.q.scheduled.size=0, executor.q.io.size=0, executor.q.system.size=0, executor.q.operations.size=0, executor.q.priorityOperation.size=0, operations.completed.count=6869, executor.q.mapLoad.size=0, executor.q.mapLoadAllKeys.size=0, executor.q.cluster.size=0, executor.q.response.size=0, operations.running.count=0, operations.pending.invocations.percentage=0.00%, operations.pending.invocations.count=0, proxy.count=0, clientEndpoint.count=0, connection.active.count=2, client.connection.count=0, connection.count=2
This includes lots of useful data like memory allocated, memory used, GC count, and GC times on the message. To find them, search for:
- hazelcast.internal.diagnostics.HealthMonitor if running Confluence 7+
- heap.memory.used or any other of the metrics printed if running Confluence 6
Option 2. Turning on GC logging at runtime
For this option in particular, we'd need to use JDK as it comes with a utility that can alter some of these arguments while the java process is running -
jinfo, making them effective without a restart. You may enable Garbage Collection (GC) logging for Synchrony service by using
jinfo during runtime, with the steps below:
This workaround would only work in Java 8, and not in Java 11.
First, identify the Process ID for Synchrony (referred here as the $SYNCHRONY_PID) using the command below:
SYNCHRONY_PID=`jcmd | grep synchrony.core | cut -d ' ' -f 1`
$SYNCHRONY_PID, we can then continue to run the following commands in terminal:
jinfo -flag +PrintGC $SYNCHRONY_PID jinfo -flag +PrintGCDetails $SYNCHRONY_PID jinfo -flag +PrintGCDateStamps $SYNCHRONY_PID jinfo -flag +PrintGCID $SYNCHRONY_PID
Unfortunately, we can't specify a dedicated GC log file for the GC loggings as the parameter can't be changed. The GC logging will be appended into the
Next, you can double confirm if the JVM flags have indeed been applied to Synchrony's JVM by running the command below:
jcmd $SYNCHRONY_PID VM.flags
Please note that the changes made via
jinfo are not persistent, meaning if you restart the application they will revert back to their default value, set by your startup scripts. If you want the changes to be effective after a restart, you will need to modify your startup scripts accordingly.
Option 3. Java flight recording
Due to the same limitation that prevents enabling GC logging, it's not possible to create a recording with system properties. Instead, use jcmd, which means a JDK is needed for this purpose. Example:
$ jcmd <Synchrony-pid> JFR.start $ jcmd <Synchrony-pid> JFR.dump filename=recording.jfr
Running just jcmd on the command line lists all Java processes running on the server, which is useful to find the Synchrony one.
With JDK mission control, you can then review the recording:
If you are using Oracle JDK, Java Flight Recorder requires a commercial license for use in production.