Confluence Diagnostics

When investigating a performance problem or outage, it's useful to know as much as possible about what was happening in your site in the lead-up to the problem. This is when diagnostics information can help.  

While often not individually actionable, diagnostic alerts can help you build up a detailed picture of your site's behaviour, and identify symptoms that may be contributing to the problem. 

This feature is still experimental in Confluence 6.11. We plan to fine-tune the thresholds and provide a UI for this diagnostic information in an upcoming Confluence release. Stay tuned!

About diagnostic alerts

The purpose of the diagnostics tool is to continuously check for symptoms or behaviours that we know may contribute to problems in your site. An alert is triggered when a set threshold is exceeded.

For example, if the free disk space for your local home (or shared home) directory falls below 256MB, an alert is triggered. This is useful because if you run out of space, your users may not be able to upload new files, export spaces, or perform other tasks that rely on writing files to that directory. 

It's important to note that the thresholds are just the point at which the alert is triggered. It's not the same as a timeout, or other hard limit. For example a long running task may trigger an alert after 5 minutes, and still complete successfully after 8 minutes. 

When an alert is triggered a message is written to the atlassian-confluence.log file (your application log), and further details provided in the atlassian-diagnostics.log file.  It's also included in support zips.

Some behaviours trigger a single alert, for others, multiple alerts are possible. Diagnostic information is stored in the database, and retained for 30 days. Old alerts are cleaned up automatically.

Types of alerts

There a several types of alerts.

Alert and KB Level Default threshold Configurable
Low free disk space Critical 8192 megabytes Yes
Low free memory Warn 256 megabytes Yes
Node left or joined the cluster Info N/A No
Long running task exceeded time limit Warn 300 seconds Yes
Garbage collection exceeded time limit Warn 10% (over the last 20 seconds) Yes

Availability

Some diagnostic alerts are disabled by default, because they may have a performance impact on your site, or are not designed to run continuously.

Our support team may ask you to enable one of the following alerts when troubleshooting a specific problem. They'll provide you with information on how to do this. 

Alert and KB Level Default threshold Configurable
HTTP request exceeded time limit Warn 60 seconds Yes
Macro rendering exceeded time limit Warn 30 seconds Yes
Thread memory allocation rate exceeded limit Warn 5% over the last 20 seconds) Yes
Sandbox crashed or was terminated during document conversion Info N/A No

Alert levels

There are three levels of diagnostic alerts:

  • Info - information that might be useful when troubleshooting a problem, for example a node joined the cluster
  • Warning  - a problem that may impact performance or availability in future, for example low memory
  • Critical - a serious problem that is likely to impact system stability or availability, for example low disk space. 

Most alerts don't require any immediate action. 

Change alert thresholds

Some alert thresholds are configurable. If you find you are seeing too many instances of an alert, you can change the threshold, so it's not triggered so easily. 

Head to Recognized System Properties for a list of system properties for each alert. This info can also be found on the knowledge base article for each alert. 

Change diagnostics behaviour

You can also change the way the diagnostics framework itself behaves. For example, you might change how often checks are performed, or how long diagnostics information is retained. 

Head to Recognized System Properties for the full list of system properties. 

Last modified on Aug 14, 2018

Was this helpful?

Yes
No
Provide feedback about this article
Powered by Confluence and Scroll Viewport.