Confluence Diagnostics
When investigating a performance problem or outage, it's useful to know as much as possible about what was happening in your site in the lead-up to the problem. This is when diagnostics information can help.
While often not individually actionable, diagnostic alerts can help you build up a detailed picture of your site's behaviour, and identify symptoms that may be contributing to the problem.
This feature is still experimental in Confluence 6.11. We plan to fine-tune the thresholds and provide a UI for this diagnostic information in an upcoming Confluence release. Stay tuned!
About diagnostic alerts
The purpose of the diagnostics tool is to continuously check for symptoms or behaviours that we know may contribute to problems in your site. An alert is triggered when a set threshold is exceeded.
For example, if the free disk space for your local home (or shared home) directory falls below 8192MB, an alert is triggered. This is useful because if you run out of space, your users may not be able to upload new files, export spaces, or perform other tasks that rely on writing files to that directory.
It's important to note that the thresholds are just the point at which the alert is triggered. It's not the same as a timeout, or other hard limit. For example a long running task may trigger an alert after 5 minutes, and still complete successfully after 8 minutes.
When an alert is triggered a message is written to the atlassian-confluence.log
file (your application log), and further details provided in the atlassian-diagnostics.log
file. It's also included in support zips.
Some behaviours trigger a single alert, for others, multiple alerts are possible. Diagnostic information is stored in the database, and retained for 30 days. Old alerts are cleaned up automatically.
Types of alerts
There a several types of alerts.
Alert and KB | Level | Default threshold | Configurable |
---|---|---|---|
Low free disk space | Critical | 8192 megabytes | Yes |
Low free memory | Warn | 256 megabytes | Yes |
Node left or joined the cluster | Info | N/A | No |
Long running task exceeded time limit | Warn | 300 seconds | Yes |
Garbage collection exceeded time limit | Warn | 10% (over the last 20 seconds) | Yes |
Availability
Some diagnostic alerts are disabled by default, because they may have a performance impact on your site, or are not designed to run continuously.
Our support team may ask you to enable one of the following alerts when troubleshooting a specific problem. They'll provide you with information on how to do this.
Alert and KB | Level | Default threshold | Configurable |
---|---|---|---|
HTTP request exceeded time limit | Warn | 60 seconds | Yes |
Macro rendering exceeded time limit | Warn | 30 seconds | Yes |
Thread memory allocation rate exceeded limit | Warn | 5% over the last 20 seconds) | Yes |
Sandbox crashed or was terminated during document conversion | Info | N/A | No |
Alert levels
There are three levels of diagnostic alerts:
- Info - information that might be useful when troubleshooting a problem, for example a node joined the cluster
- Warning - a problem that may impact performance or availability in future, for example low memory
- Critical - a serious problem that is likely to impact system stability or availability, for example low disk space.
Most alerts don't require any immediate action.
Change alert thresholds
Some alert thresholds are configurable. If you find you are seeing too many instances of an alert, you can change the threshold, so it's not triggered so easily.
Head to Recognized System Properties for a list of system properties for each alert. This info can also be found on the knowledge base article for each alert.
Change diagnostics behaviour
You can also change the way the diagnostics framework itself behaves. For example, you might change how often checks are performed, or how long diagnostics information is retained.
Head to Recognized System Properties for the full list of system properties.