How to monitor the Synchrony cluster health
Platform Notice: Data Center Only - This article only applies to Atlassian products on the Data Center platform.
Note that this KB was created for the Data Center version of the product. Data Center KBs for non-Data-Center-specific features may also work for Server versions of the product, however they have not been tested. Support for Server* products ended on February 15th 2024. If you are running a Server product, you can visit the Atlassian Server end of support announcement to review your migration options.
*Except Fisheye and Crucible
Purpose
Synchrony self-managed mode has some limitations in terms of system properties that can be loaded. This is covered in the following feature request:
In that scenario, we need to leverage different solutions in order to monitor the Synchrony cluster health. We'll cover some of them in this article.
This limitation was fixed on version 7.20, which now allows GC log parameters through the synchrony-args.properties file. The strategies covered below are still valid for those newer versions but are more useful up to 7.19.
Workarounds
Option 1. Hazelcast health monitor
Both Confluence and Synchrony clusters use Hazelcast, which includes the following feature:
This means some extra diagnostics will be printed on the logs if one of the following conditions is met:
- Memory usage > 70%
- CPU usage > 70%
The thresholds can be configured using system properties covered in the document. Alternatively, you can set the log level to NOISY to have the message printed every 20s (interval also configurable):
- Edit the file <Confluence-local-home>/synchrony-args.properties
Add the following line at the bottom:
hazelcast.health.monitoring.level=NOISY
- Save and access the node you just modified
- Restart Synchrony on the Collaborative Editing management page
- Repeat on all nodes
If the default threshold is met or if you enabled NOISY log level, the following messages are printed on the atlassian-synchrony.log file:
INFO [hz._hzInstance_1.HealthMonitor] [hazelcast.internal.diagnostics.HealthMonitor] [1.1.1.1]:5701 [confluence-Synchrony] [3.11.4] processors=8, physical.memory.total=0, physical.memory.free=0, swap.space.total=0, swap.space.free=0, heap.memory.used=1.6G, heap.memory.free=449.5M, heap.memory.total=2.0G, heap.memory.max=2.0G, heap.memory.used/total=78.03%, heap.memory.used/max=78.03%, minor.gc.count=264, minor.gc.time=20332ms, major.gc.count=0, major.gc.time=0ms, load.process=0.00%, load.system=0.00%, load.systemAverage=13.72, thread.count=109, thread.peakCount=233, cluster.timeDiff=0, event.q.size=0, executor.q.async.size=0, executor.q.client.size=0, executor.q.client.query.size=0, executor.q.client.blocking.size=0, executor.q.query.size=0, executor.q.scheduled.size=0, executor.q.io.size=0, executor.q.system.size=0, executor.q.operations.size=0, executor.q.priorityOperation.size=0, operations.completed.count=6869, executor.q.mapLoad.size=0, executor.q.mapLoadAllKeys.size=0, executor.q.cluster.size=0, executor.q.response.size=0, operations.running.count=0, operations.pending.invocations.percentage=0.00%, operations.pending.invocations.count=0, proxy.count=0, clientEndpoint.count=0, connection.active.count=2, client.connection.count=0, connection.count=2
This includes lots of useful data like memory allocated, memory used, GC count, and GC times on the message. To find them, search for:
- hazelcast.internal.diagnostics.HealthMonitor if running Confluence 7+
- heap.memory.used or any other of the metrics printed if running Confluence 6
Option 2. Turning on GC logging at runtime
For this option in particular, we'd need to use JDK as it comes with a utility that can alter some of these arguments while the java process is running - jinfo
, making them effective without a restart. You may enable Garbage Collection (GC) logging for Synchrony service by using jinfo
during runtime, with the steps below:
This workaround would only work in Java 8, and not in Java 11.
First, identify the Process ID for Synchrony (referred here as the $SYNCHRONY_PID) using the command below:
SYNCHRONY_PID=`jcmd | grep synchrony.core | cut -d ' ' -f 1`
Using the
$SYNCHRONY_PID
, we can then continue to run the following commands in terminal:jinfo -flag +PrintGC $SYNCHRONY_PID jinfo -flag +PrintGCDetails $SYNCHRONY_PID jinfo -flag +PrintGCDateStamps $SYNCHRONY_PID jinfo -flag +PrintGCID $SYNCHRONY_PID
Unfortunately, we can't specify a dedicated GC log file for the GC loggings as the parameter can't be changed. The GC logging will be appended into the
<Confluence-Home/logs/atlassian-synchrony.log>
file.Next, you can double confirm if the JVM flags have indeed been applied to Synchrony's JVM by running the command below:
jcmd $SYNCHRONY_PID VM.flags
Please note that the changes made via jinfo
are not persistent, meaning if you restart the application they will revert back to their default value, set by your startup scripts. If you want the changes to be effective after a restart, you will need to modify your startup scripts accordingly.
Option 3. Java flight recording
Another option is to generate a Java Flight recording, which can later be reviewed using the JDK Mission Control application. To create a recording, consult the appropriate Java vendor documentation:
Due to the same limitation that prevents enabling GC logging, it's not possible to create a recording with system properties. Instead, use jcmd, which means a JDK is needed for this purpose. Example:
$ jcmd <Synchrony-pid> JFR.start
$ jcmd <Synchrony-pid> JFR.dump filename=recording.jfr
Running just jcmd on the command line lists all Java processes running on the server, which is useful to find the Synchrony one.
With JDK mission control, you can then review the recording:
If you are using Oracle JDK, Java Flight Recorder requires a commercial license for use in production.