How to monitor the Synchrony cluster health

Still need help?

The Atlassian Community is here for you.

Ask the community

Platform Notice: Data Center Only - This article only applies to Atlassian products on the data center platform.

Purpose

Synchrony self-managed mode has some limitations in terms of system properties that can be loaded. This is covered in the following feature request:

In that scenario, we need to leverage different solutions in order to monitor the Synchrony cluster health. We'll cover some of them in this article.

Solution

Hazelcast health monitor

Both Confluence and Synchrony clusters use Hazelcast, which includes the following feature:

This means some extra diagnostics will be printed on the logs if one of the following conditions is met:

  • Memory usage > 70%
  • CPU usage > 70%

The thresholds can be configured using system properties covered in the document. Alternatively, you can set the log level to NOISY to have the message printed every 20s (interval also configurable):

  1. Edit the file <Confluence-local-home>/synchrony-args.properties
  2. Add the following line at the bottom:

    hazelcast.health.monitoring.level=NOISY
  3. Save and access the node you just modified
  4. Restart Synchrony on the Collaborative Editing management page
  5. Repeat on all nodes

If the default threshold is met or if you enabled NOISY log level, the following messages are printed on the atlassian-synchrony.log file:

INFO [hz._hzInstance_1.HealthMonitor] [hazelcast.internal.diagnostics.HealthMonitor] [1.1.1.1]:5701 [confluence-Synchrony] [3.11.4] processors=8, physical.memory.total=0, physical.memory.free=0, swap.space.total=0, swap.space.free=0, heap.memory.used=1.6G, heap.memory.free=449.5M, heap.memory.total=2.0G, heap.memory.max=2.0G, heap.memory.used/total=78.03%, heap.memory.used/max=78.03%, minor.gc.count=264, minor.gc.time=20332ms, major.gc.count=0, major.gc.time=0ms, load.process=0.00%, load.system=0.00%, load.systemAverage=13.72, thread.count=109, thread.peakCount=233, cluster.timeDiff=0, event.q.size=0, executor.q.async.size=0, executor.q.client.size=0, executor.q.client.query.size=0, executor.q.client.blocking.size=0, executor.q.query.size=0, executor.q.scheduled.size=0, executor.q.io.size=0, executor.q.system.size=0, executor.q.operations.size=0, executor.q.priorityOperation.size=0, operations.completed.count=6869, executor.q.mapLoad.size=0, executor.q.mapLoadAllKeys.size=0, executor.q.cluster.size=0, executor.q.response.size=0, operations.running.count=0, operations.pending.invocations.percentage=0.00%, operations.pending.invocations.count=0, proxy.count=0, clientEndpoint.count=0, connection.active.count=2, client.connection.count=0, connection.count=2

This includes lots of useful data like memory allocated, memory used, GC count, and GC times on the message. To find them, search for:

  • hazelcast.internal.diagnostics.HealthMonitor if running Confluence 7+
  • heap.memory.used or any other of the metrics printed if running Confluence 6

Java flight recording

Another option is to generate a Java Flight recording, which can later be reviewed using the JDK Mission Control application. To create a recording, consult the appropriate Java vendor documentation:

Due to the same limitation that prevents enabling GC logging, it's not possible to create a recording with system properties. Instead, use jcmd, which means a JDK is needed for this purpose. Example:

$ jcmd <Synchrony-pid> JFR.start
$ jcmd <Synchrony-pid> JFR.dump filename=recording.jfr

(info) Running just jcmd on the command line lists all Java processes running on the server, which is useful to find the Synchrony one.

With JDK mission control, you can then review the recording:

(warning)If you are using Oracle JDK, Java Flight Recorder requires a commercial license for use in production.


Last modified on Jul 12, 2021

Was this helpful?

Yes
No
Provide feedback about this article
Powered by Confluence and Scroll Viewport.