Confluence Diagnostics

When investigating a performance problem or outage, it's useful to know as much as possible about what was happening in your site in the lead-up to the problem. This is when diagnostics information can help.  

While often not individually actionable, diagnostic alerts can help you build up a detailed picture of your site's behaviour, and identify symptoms that may be contributing to the problem. 

This feature is still experimental in Confluence 6.11. We plan to fine-tune the thresholds and provide a UI for this diagnostic information in an upcoming Confluence release. Stay tuned!

About diagnostic alerts

The purpose of the diagnostics tool is to continuously check for symptoms or behaviours that we know may contribute to problems in your site. An alert is triggered when a set threshold is exceeded.

For example, if the free disk space for your local home (or shared home) directory falls below 8192MB, an alert is triggered. This is useful because if you run out of space, your users may not be able to upload new files, export spaces, or perform other tasks that rely on writing files to that directory. 

It's important to note that the thresholds are just the point at which the alert is triggered. It's not the same as a timeout, or other hard limit. For example a long running task may trigger an alert after 5 minutes, and still complete successfully after 8 minutes. 

When an alert is triggered a message is written to the atlassian-confluence.log file (your application log), and further details provided in the atlassian-diagnostics.log file.  It's also included in support zips.

Some behaviours trigger a single alert, for others, multiple alerts are possible. Diagnostic information is stored in the database, and retained for 30 days. Old alerts are cleaned up automatically.

Types of alerts

There a several types of alerts.

Alert and KBLevelDefault thresholdConfigurable
Low free disk spaceCritical8192 megabytesYes
Low free memoryWarn256 megabytesYes
Node left or joined the clusterInfoN/ANo
Long running task exceeded time limitWarn300 secondsYes
Garbage collection exceeded time limitWarn10% (over the last 20 seconds)Yes

Availability

Some diagnostic alerts are disabled by default, because they may have a performance impact on your site, or are not designed to run continuously.

Our support team may ask you to enable one of the following alerts when troubleshooting a specific problem. They'll provide you with information on how to do this. 

Alert and KBLevelDefault thresholdConfigurable
HTTP request exceeded time limitWarn60 secondsYes
Macro rendering exceeded time limitWarn30 secondsYes
Thread memory allocation rate exceeded limitWarn5% over the last 20 seconds)Yes
Sandbox crashed or was terminated during document conversionInfoN/ANo

Alert levels

There are three levels of diagnostic alerts:

  • Info - information that might be useful when troubleshooting a problem, for example a node joined the cluster
  • Warning  - a problem that may impact performance or availability in future, for example low memory
  • Critical - a serious problem that is likely to impact system stability or availability, for example low disk space. 

Most alerts don't require any immediate action. 

Change alert thresholds

Some alert thresholds are configurable. If you find you are seeing too many instances of an alert, you can change the threshold, so it's not triggered so easily. 

Head to Recognized System Properties for a list of system properties for each alert. This info can also be found on the knowledge base article for each alert. 

Change diagnostics behaviour

You can also change the way the diagnostics framework itself behaves. For example, you might change how often checks are performed, or how long diagnostics information is retained. 

Head to Recognized System Properties for the full list of system properties. 

Last modified on Dec 14, 2020

Was this helpful?

Yes
No
Provide feedback about this article
Powered by Confluence and Scroll Viewport.