Capturing heap dumps before FullGCs to troubleshoot memory problems

Still need help?

The Atlassian Community is here for you.

Ask the community


Platform Notice: Server and Data Center Only - This article only applies to Atlassian products on the server and data center platforms.

Purpose

This page provides how to identify if collecting a heap dump is necessary to diagnose performance issues and the various ways in which a heap dump can be collected.

Symptoms

  • Jira is performing slowly and more information is needed to determine root cause.

  • GC logs reveal significant pausing and reduced throughput due to full garbage collection events.

To check for significant pausing due to full garbage collection events use the following grep command in the same directory of the GC logs:

$ grep "$(date +"%Y-%m-%d")" atlassian-jira-gc-* | grep 'Full GC'

If Full GC events are occurring frequently the output may resemble the following:

atlassian-jira-gc-2019-01-29_18-36-59.log.0.current:2019-01-30T06:52:40.885-0800: 44141.480: [Full GC (Ergonomics) [PSYoungGen: 2653696K->60561K(2711040K)] [ParOldGen: 5592278K->5592087K(5592576K)] 8245974K->5652648K(8303616K), [Metaspace: 505008K->505008K(1552384K)], 6.6621633 secs] [Times: user=27.14 sys=1.06, real=6.66 secs] 
atlassian-jira-gc-2019-01-29_18-36-59.log.0.current:2019-01-30T06:52:47.938-0800: 44148.535: [Full GC (Ergonomics) [PSYoungGen: 2653696K->82890K(2711040K)] [ParOldGen: 5592087K->5592231K(5592576K)] 8245783K->5675121K(8303616K), [Metaspace: 505008K->505008K(1552384K)], 7.3033287 secs] [Times: user=29.64 sys=0.88, real=7.31 secs] 
atlassian-jira-gc-2019-01-29_18-36-59.log.0.current:2019-01-30T06:52:55.595-0800: 44156.187: [Full GC (Ergonomics) [PSYoungGen: 2653696K->42798K(2711040K)] [ParOldGen: 5592231K->5592340K(5592576K)] 8245927K->5635138K(8303616K), [Metaspace: 505008K->505002K(1552384K)], 2.5531892 secs] [Times: user=20.80 sys=0.06, real=2.55 secs] 
atlassian-jira-gc-2019-01-29_18-36-59.log.0.current:2019-01-30T06:52:58.588-0800: 44159.175: [Full GC (Ergonomics) [PSYoungGen: 2653696K->47447K(2711040K)] [ParOldGen: 5592340K->5592469K(5592576K)] 8246036K->5639916K(8303616K), [Metaspace: 505002K->505002K(1552384K)], 5.4595281 secs] [Times: user=22.88 sys=0.73, real=5.45 secs] 
atlassian-jira-gc-2019-01-29_18-36-59.log.0.current:2019-01-30T06:53:04.403-0800: 44164.991: [Full GC (Ergonomics) [PSYoungGen: 2653696K->70037K(2711040K)] [ParOldGen: 5592469K->5592519K(5592576K)] 8246165K->5662556K(8303616K), [Metaspace: 505015K->505015K(1552384K)], 6.5850514 secs] [Times: user=23.51 sys=0.98, real=6.58 secs] 
atlassian-jira-gc-2019-01-29_18-36-59.log.0.current:2019-01-30T06:53:11.320-0800: 44171.910: [Full GC (Ergonomics) [PSYoungGen: 2653696K->62430K(2711040K)] [ParOldGen: 5592519K->5592280K(5592576K)] 8246215K->5654710K(8303616K), [Metaspace: 505015K->505015K(1552384K)], 7.0314006 secs] [Times: user=23.47 sys=0.91, real=7.02 secs] 


Note the timestamps of the first three lines returned, 6:52:40, 6:52:47, 6:52:55, and their corresponding pause times, 6.66 secs, 7.31 secs, 2.55 secs. This informs us that for 15 seconds of clock time the application was paused while performing garbage collection. While a few seconds of pausing every hour or so due to garbage collection is likely to go unnoticed frequent pausing in this manner will cause noticeable performance impacts. In this circumstance a heap dump will be needed to investigate further.

Solution

The following script will identify the process ID of Jira which we’ll use to enable the HeapDumpBeforeFullGC flag using jinfo:

JIRA_PID=`ps aux | grep -i jira | grep -i java | awk  -F '[ ]*' '{print $2}'`;
jinfo -flag +HeapDumpBeforeFullGC $JIRA_PID

jinfo is included in the JDK and is not available in the JRE

This will generate a heap dump when this problem occurs. Keep in mind that collecting a heap dump will increase outage time from a few seconds to a few minutes as the JVM will need to write its heap memory to a file. The heap dump file will be in the directory defined with -XX:HeapDumpPath or Tomcat’s working directory ( see Using Memory Dumps to Analyze OutOfMemoryErrors ). The maximum heap size will determine how large the file will be. For example, a maximum heap size of 28 Gigabytes would result in a 28 Gigabyte heap dump file. Be sure to check there is sufficient disk space in this directory and Jira has read / write permissions.

After a heap dump is generated a restart of Jira is recommended to remove the HeapDumpBeforeFullGC flag and reduce the risk of multiple heap dumps filling up available drive space.

The JVM may be unresponsive and require sending SIGKILL ( kill -9 ) in order to restart Jira. We recommend this approach if there are more than 3 heap dumps generated by the HeapDumpBeforeFullGC flag

If you're not experiencing performance issues from consistent Full GC thrashing, you can turn off this flag by running the following command:

JIRA_PID=`ps aux | grep -i jira | grep -i java | awk  -F '[ ]*' '{print $2}'`;
jinfo -flag -HeapDumpBeforeFullGC $JIRA_PID


Compress the heap dump with something like gzip and provide the file to Atlassian Support for root-cause analysis.

Notes

We strongly advise against leaving HeapDumpBeforeFullGC enabled. Doing so impacts performance and greatly increases increases the risk of running out of disk space.

  • While it’s possible to capture a heap manually using jmap it is not recommended as the timing of heap dump collection is crucial- too soon or too late will simply show normal operating behavior.

  • A heap dump is not captured automatically in this circumstance even with HeapDumpOnOutOfMemory enabled as garbage collection routines free enough memory to prevent an out of memory error but not enough for the application to respond quickly, or at all, to user’s requests.

  • If installing the JDK is not an option you may consider setting -XX:+HeapDumpBeforeFullGC in Jira’s startup parameters however this option should be removed immediately after a heap dump is generated and before Jira is restarted.


DescriptionHow to generate heap dumps with the HeapDumpBeforeFullGC parameter to diagnose performance problems
ProductJira

Last modified on Oct 7, 2019

Was this helpful?

Yes
No
Provide feedback about this article
Powered by Confluence and Scroll Viewport.