Performance Troubleshooting Tools - Best Practice

Introduction

This best practice guide is for sysadmins who are looking to gain a clearer view of what's happening in their instances and get the most out of Atlassian Support. We will describe what we consider the best tools for monitoring and analyzing diagnostic data produced by Atlassian products and show you the best way of getting this information to the Atlassian Support team. This guide is segmented by the diagnostic artifact that is produced, and goes into detail on which tool is recommend and how to best use this tool.

The KB articles linked in this document may refer to a specific product, but the techniques described are universal.

This table is a rough guide to when these tools are most likely to be of use, based on the problem you are experiencing.

Symptom Thread Dumps Heap Dumps Garbage Collection Logs Database Profiling

CPU Profiling

Client Side Diagnostics Disk Speed Access Logs Page (or Application) Profiling
Whole instance is slow (tick)   (tick) (tick) (tick)   (tick) (tick) (tick)
Whole instance is unresponsive (tick)   (tick) (tick) (tick)   (tick) (tick) (tick)
Specific functions in the application are slow (tick)   (tick) (tick) (tick) (tick)     (tick)
Application server is overloaded (but database server appears fine) (tick)   (tick)   (tick)   (tick) (tick) (tick)
Database server is overloaded (but application server appears fine) (tick)     (tick)          
OutOfMemory errors or high memory usage   (tick) (tick)            

Tools

Thread dumps

When to generate thread dumps

Thread dumps are the most useful tool for investigating performance problems. If an instance is slow or experiencing outages, a thread dump give us a snapshot of what is going on in the instance at exactly that time. A sequence of thread dumps allows us to look at what is going on in over time in an instance. A thread dump reveals if there is some performance issue with the application code itself, which is most often where performance issues are found. Thread dumps also often reveal what other diagnostic artifacts are required.

There's no hard and fast rule as to the quantity and timing of thread dumps that should be created, but as long as they are taken at regular intervals over a period when the instance is performing poorly, they will be useful. For example, a total of 6 thread dumps, one taken every 30 seconds is a common format used in Atlassian Support.

Which tool to use

For generating thread dumps, please follow the instructions in Generating a Thread Dump. We recommend using the method in Troubleshooting Performance Issues with thread dumps if using Linux, as the output of the CPU % can be correlated to a thread.

For processing thread dumps, The Atlassian Support team uses and recommends TDA.

TDA can be used to identify long running threads, lock congestion or contention and get an overall understanding of what the threads of an instance are doing.

Heap Dumps

When to generate a heap dump

A heap dump is useful when the instance is running out of memory and the common causes have already been addressed. A heap dump gives us a breakdown of all the objects that exists in the java heap at the time it was taken. It's important to lower the size of the heap as much as possible before creating a heap dump, as a very large heap can make it harder to analyse as significant resources are required for large heap dumps. A heap dump can also be useful in conjunction with garbage collection logs.

Which tool to use

For generating heap dumps, please follow the instructions in Generating a Heap Dump. Heap dumps can be configured to be automatically generated when an instance runs out of memory, or can be run manually. However, if a heap dump is not generated when an OutOfMemoryError is thrown, it is generally not at all useful.

For processing heap dumps, The Atlassian Support team uses and recommends Eclipse MAT.

Eclipse MAT's dominator tree - right clicking on a thread and selecting Java Basics > Thread Stacks on the threads that are using the largest amount of heap will quickly and easily let you see the stack trace of that thread. More information can be found in the MAT Documentation.

Garbage Collection (GC) Logs

When to generate GC logs

Is Confluence becoming completely unresponsive for a while, and then recovering without any changes being made? General slowness that hasn't been attributed to the code itself (via thread dumps)? Perhaps the instance is consuming too much memory. If any of the above, you may be experiencing garbage collection problems. Our Garbage Collection (GC) Tuning Guide provides expansive detail on this, but in brief, the three performance goals of garbage collection is to affect the following:

  • Reduce Latency - Pauses induced by the JVM as it performs GC
  • Improve Throughput - The percentage of clock time the JVM has available for the actual application
  • Reduce Footprint - The heap size

Which tool to use

To generate GC logs, please follow the instructions in How to Enable Garbage Collection (GC) Logging.

For processing GC logs, Atlassian Support uses and recommends GCViewer.

GCViewer displaying some partial GCs over the first minute after startup. For information on what the lines mean, click the View menu button.

Database Profiling

When to use database profiling

Is the CPU of your database server maxing out, but your application's server running normally? Does a thread dump reveal that there are many long running threads waiting for a connection to the database? If so, we can use database profiling to examine where time is spent during query execution in the database.

Which tool to use

See CPU profiling tools below. Additionally JProfiler can be used as detailed in Using jProfiler to analyse a JIRA application performance.

CPU Profiling

When to use CPU profiling

Running the 'top' command shows high CPU usage on the application server, and Confluence is running slowly. This is often used in conjunction with thread dumps to examine which part of the thread dump is consuming the CPU.

Which tool to use

There are many options available here. This may depend on what your company already has configured, but some good options are:

For Linux based instances, Troubleshooting Performance Issues with thread dumps includes a method of generating CPU profiling information without any paid tools.

JProfiler in action - here we can see that the permission checking is the most expensive CPU operation that was profiled.

Client Side Diagnostics

When to check the client side information

If your application is seeing performance issues, but it isn't attributable to the backend (as diagnosed by other tools such as thread dumps) it's useful to get information from your browser. Namely, the network information, and console logs, to look for any errors in the front end.

Which tool to use

Any browser's developer tools are capable of providing the information required, but Atlassian Support prefers Chrome due to the Chrome HAR viewer. This depends upon how the information is generated - different browsers have different means as in Generating HAR files and analyzing web requests.

How to generate client side diagnostic information

  1. Open the Chrome developer toolbar (Right click on any empty space of chrome window, choose *Inspect Element* or see other way of opening the tool at this page).
  2. Go to the Network tab (see example here). Clear all the existing text in the Network tab if there's any by clicking the remove icon next to the red circle icon at top right corner of the developer tool window.
  3. Perform whichever operation is slow.
  4. There will be some requests logged into the network window. Save this information to a HAR file. Instructions for this.
  5. Also open the *console tab* to see if there is any errors in the output (how to work with the console).
  6. Save the text outputted by the console to a file.

At the end of this we should have 2 files: 1 HAR file for the network request and 1 log file of the console output.

The information the Chrome network tab captured as I loaded the dashboard of Confluence

Disk Speed

When to check the disk speed

If your application is slow and the following areas in of the application in particular are performing poorly, then it's advisable to check disk speed:

  • Reindexing
  • Searching
  • Attachments
  • The Issue Navigator (in JIRA)

Which tool to use

This is another internal tool. Although listed for JIRA, this will work for any Atlassian application.

Disk speed for a local instance

Access Logs

When to check the access logs

Access logs are a great way of seeing general patterns in the usage of an instance. It can be a good way of finding if a particular REST endpoint is being heavily accessed, or if there is some automation causing too many requests.

Enabling access logs

For JIRA: User Access Logging

For Fisheye/Crucible: Enabling Access Logging In Fisheye

For Confluence: How to Enable User Access Logging

For Bitbucket Server: N/A (always on)

Which tool to use

For JIRA: HTTP Requests Log Analyser

For Bitbucket Server (formerly known as Stash): Atlassian Stash and Bitbucket Server access log parser

Page Profiling

When to use page profiling

If a particular page or piece of functionality is performing poorly, page profiling can be used to help determine which method is responsible for this slowness, by displaying the time taken to execute each method. For general slowness, other methods are more useful.

Which tool to use

This is an built-in tool for JIRA and Confluence.

The General Configuration page of Confluence Admin


Last modified on Jul 31, 2018

Was this helpful?

Yes
No
Provide feedback about this article
Powered by Confluence and Scroll Viewport.