How to monitor the Synchrony cluster health

Still need help?

The Atlassian Community is here for you.

Ask the community

Platform Notice: Data Center Only - This article only applies to Atlassian products on the data center platform.

Purpose

Synchrony self-managed mode has some limitations in terms of system properties that can be loaded. This is covered in the following feature request:

In that scenario, we need to leverage different solutions in order to monitor the Synchrony cluster health. We'll cover some of them in this article.

Workarounds

Option 1.  Hazelcast health monitor

Both Confluence and Synchrony clusters use Hazelcast, which includes the following feature:

This means some extra diagnostics will be printed on the logs if one of the following conditions is met:

  • Memory usage > 70%
  • CPU usage > 70%

The thresholds can be configured using system properties covered in the document. Alternatively, you can set the log level to NOISY to have the message printed every 20s (interval also configurable):

  1. Edit the file <Confluence-local-home>/synchrony-args.properties
  2. Add the following line at the bottom:

    hazelcast.health.monitoring.level=NOISY
  3. Save and access the node you just modified
  4. Restart Synchrony on the Collaborative Editing management page
  5. Repeat on all nodes

If the default threshold is met or if you enabled NOISY log level, the following messages are printed on the atlassian-synchrony.log file:

INFO [hz._hzInstance_1.HealthMonitor] [hazelcast.internal.diagnostics.HealthMonitor] [1.1.1.1]:5701 [confluence-Synchrony] [3.11.4] processors=8, physical.memory.total=0, physical.memory.free=0, swap.space.total=0, swap.space.free=0, heap.memory.used=1.6G, heap.memory.free=449.5M, heap.memory.total=2.0G, heap.memory.max=2.0G, heap.memory.used/total=78.03%, heap.memory.used/max=78.03%, minor.gc.count=264, minor.gc.time=20332ms, major.gc.count=0, major.gc.time=0ms, load.process=0.00%, load.system=0.00%, load.systemAverage=13.72, thread.count=109, thread.peakCount=233, cluster.timeDiff=0, event.q.size=0, executor.q.async.size=0, executor.q.client.size=0, executor.q.client.query.size=0, executor.q.client.blocking.size=0, executor.q.query.size=0, executor.q.scheduled.size=0, executor.q.io.size=0, executor.q.system.size=0, executor.q.operations.size=0, executor.q.priorityOperation.size=0, operations.completed.count=6869, executor.q.mapLoad.size=0, executor.q.mapLoadAllKeys.size=0, executor.q.cluster.size=0, executor.q.response.size=0, operations.running.count=0, operations.pending.invocations.percentage=0.00%, operations.pending.invocations.count=0, proxy.count=0, clientEndpoint.count=0, connection.active.count=2, client.connection.count=0, connection.count=2

This includes lots of useful data like memory allocated, memory used, GC count, and GC times on the message. To find them, search for:

  • hazelcast.internal.diagnostics.HealthMonitor if running Confluence 7+
  • heap.memory.used or any other of the metrics printed if running Confluence 6

Option 2. Turning on GC logging at runtime

For this option in particular, we'd need to use JDK as it comes with a utility that can alter some of these arguments while the java process is running - jinfo, making them effective without a restart. You may enable Garbage Collection (GC) logging for Synchrony service by using jinfo during runtime, with the steps below:

This workaround would only work in Java 8, and not in Java 11.


  1. First, identify the Process ID for Synchrony (referred here as the $SYNCHRONY_PID) using the command below:

    SYNCHRONY_PID=`jcmd  | grep synchrony.core | cut -d ' ' -f 1`
  2. Using the $SYNCHRONY_PID, we can then continue to run the following commands in terminal:

    jinfo -flag +PrintGC $SYNCHRONY_PID
    jinfo -flag +PrintGCDetails $SYNCHRONY_PID
    jinfo -flag +PrintGCDateStamps $SYNCHRONY_PID
    jinfo -flag +PrintGCID $SYNCHRONY_PID

    Unfortunately, we can't specify a dedicated GC log file for the GC loggings as the parameter can't be changed. The GC logging will be appended into the <Confluence-Home/logs/atlassian-synchrony.log> file.

  3. Next, you can double confirm if the JVM flags have indeed been applied to Synchrony's JVM by running the command below:

    jcmd $SYNCHRONY_PID VM.flags

Please note that the changes made via jinfo  are not persistent, meaning if you restart the application they will revert back to their default value, set by your startup scripts. If you want the changes to be effective after a restart, you will need to modify your startup scripts accordingly.

Option 3. Java flight recording

Another option is to generate a Java Flight recording, which can later be reviewed using the JDK Mission Control application. To create a recording, consult the appropriate Java vendor documentation:

Due to the same limitation that prevents enabling GC logging, it's not possible to create a recording with system properties. Instead, use jcmd, which means a JDK is needed for this purpose. Example:

$ jcmd <Synchrony-pid> JFR.start
$ jcmd <Synchrony-pid> JFR.dump filename=recording.jfr

(info) Running just jcmd on the command line lists all Java processes running on the server, which is useful to find the Synchrony one.

With JDK mission control, you can then review the recording:

(warning)If you are using Oracle JDK, Java Flight Recorder requires a commercial license for use in production.


Last modified on Jul 13, 2022

Was this helpful?

Yes
No
Provide feedback about this article
Powered by Confluence and Scroll Viewport.