How to capture Java Heap dumps before Full GC in Confluence
Platform notice: Server and Data Center only. This article only applies to Atlassian products on the Server and Data Center platforms.
Support for Server* products ended on February 15th 2024. If you are running a Server product, you can visit the Atlassian Server end of support announcement to review your migration options.
*Except Fisheye and Crucible
Summary
This page provides how to identify if collecting a Heap dump is necessary to diagnose performance issues and the various ways in which a Heap dump can be collected.
Environment
Any version of Confluence running on Java 8 or higher.
Solution
Assessment
Confluence is performing slowly and more information is needed to determine the root cause.
GC logs reveal significant pausing and reduced throughput due to full garbage collection events.
To check for significant pausing due to full garbage collection events use the following grep command in the same directory of the GC logs:
$ grep "$(date +"%Y-%m-%d")" gc-* | egrep '(Full GC|Pause Full)'
Java 8 prints "Full GC" and Java 11+, "Pause Full".
If Full GC events are occurring frequently the output may resemble the following:
gc-2019-01-29_18-36-59.log.0.current:2019-01-30T06:52:40.885-0800: 44141.480: [Full GC (Ergonomics) [PSYoungGen: 2653696K->60561K(2711040K)] [ParOldGen: 5592278K->5592087K(5592576K)] 8245974K->5652648K(8303616K), [Metaspace: 505008K->505008K(1552384K)], 6.6621633 secs] [Times: user=27.14 sys=1.06, real=6.66 secs]
gc-2019-01-29_18-36-59.log.0.current:2019-01-30T06:52:47.938-0800: 44148.535: [Full GC (Ergonomics) [PSYoungGen: 2653696K->82890K(2711040K)] [ParOldGen: 5592087K->5592231K(5592576K)] 8245783K->5675121K(8303616K), [Metaspace: 505008K->505008K(1552384K)], 7.3033287 secs] [Times: user=29.64 sys=0.88, real=7.31 secs]
gc-2019-01-29_18-36-59.log.0.current:2019-01-30T06:52:55.595-0800: 44156.187: [Full GC (Ergonomics) [PSYoungGen: 2653696K->42798K(2711040K)] [ParOldGen: 5592231K->5592340K(5592576K)] 8245927K->5635138K(8303616K), [Metaspace: 505008K->505002K(1552384K)], 2.5531892 secs] [Times: user=20.80 sys=0.06, real=2.55 secs]
gc-2019-01-29_18-36-59.log.0.current:2019-01-30T06:52:58.588-0800: 44159.175: [Full GC (Ergonomics) [PSYoungGen: 2653696K->47447K(2711040K)] [ParOldGen: 5592340K->5592469K(5592576K)] 8246036K->5639916K(8303616K), [Metaspace: 505002K->505002K(1552384K)], 5.4595281 secs] [Times: user=22.88 sys=0.73, real=5.45 secs]
gc-2019-01-29_18-36-59.log.0.current:2019-01-30T06:53:04.403-0800: 44164.991: [Full GC (Ergonomics) [PSYoungGen: 2653696K->70037K(2711040K)] [ParOldGen: 5592469K->5592519K(5592576K)] 8246165K->5662556K(8303616K), [Metaspace: 505015K->505015K(1552384K)], 6.5850514 secs] [Times: user=23.51 sys=0.98, real=6.58 secs]
gc-2019-01-29_18-36-59.log.0.current:2019-01-30T06:53:11.320-0800: 44171.910: [Full GC (Ergonomics) [PSYoungGen: 2653696K->62430K(2711040K)] [ParOldGen: 5592519K->5592280K(5592576K)] 8246215K->5654710K(8303616K), [Metaspace: 505015K->505015K(1552384K)], 7.0314006 secs] [Times: user=23.47 sys=0.91, real=7.02 secs]
Note the timestamps of the first three lines returned, 6:52:40, 6:52:47, 6:52:55, and their corresponding pause times, 6.66 secs, 7.31 secs, 2.55 secs. This informs us that for 15 seconds of clock time the application was paused while performing garbage collection. While a few seconds of pausing every hour or so due to garbage collection is likely to go unnoticed frequent pausing in this manner will cause noticeable performance impacts. In this circumstance a heap dump will be needed to investigate further.
Creating the Heap Dump file immediately
If you are sure that the issue is happening, you can generate the heap dump before waiting for the Full GC to happen, as Confluence not always triggers it when a slowness due to high heap utilization happens.
You will need a Java JDK, and then you can collect using any of these 2 JDK utilities, while Confluence is running and presenting the problem:
Option 1
CONFLUENCE_PID=`ps aux | grep -i confluence | grep -i java | awk -F '[ ]*' '{print $2}'`;
jcmd $CONFLUENCE_PID GC.heap_dump filename=heap.bin
Option 2
CONFLUENCE_PID=`ps aux | grep -i confluence | grep -i java | awk -F '[ ]*' '{print $2}'`;
jmap -dump:[live,]format=b,file=confluence-heap.bin $CONFLUENCE_PID
The file should be saved on a disk with more space than the heap memory, and the command should be executed using the Confluence user.
Injecting the JVM flag to capture Heap dump before Full GC
The following script will identify the process ID of Confluence which we’ll use to enable the HeapDumpBeforeFullGC flag using jinfo (bundled with JDK):
$ CONFLUENCE_PID=`ps aux | grep -i confluence | grep -i java | awk -F '[ ]*' '{print $2}'`;
$ jinfo -flag +HeapDumpBeforeFullGC $CONFLUENCE_PID
Run jinfo as the same user that's running the Confluence process.
Also check if you have the HeapDumpPath flag pointing to some location with enough disk space:
$ jinfo -flag HeapDumpPath $CONFLUENCE_PID -XX:HeapDumpPath=
If jinfo prints the HeapDumpPath empty, like in the above example, you may set it, too, to make sure the JVM writes the file to a location with enough disk space:
$ jinfo -flag HeapDumpPath=/some-directory-with-enough-disk-space/ $CONFLUENCE_PID
jinfo is included in the JDK and is not available in the JRE
This will generate a heap dump when this problem occurs. Keep in mind that collecting a heap dump will increase outage time from a few seconds to a few minutes as the JVM will need to write its heap memory to a file. The heap dump file will be in the directory defined with -XX:HeapDumpPath or Tomcat’s working directory ( see Using Memory Dumps to Analyze OutOfMemoryErrors ). The maximum heap size will determine how large the file will be. For example, a maximum heap size of 28 Gigabytes would result in a 28 Gigabyte heap dump file. Be sure to check there is sufficient disk space in this directory and Confluence has read / write permissions.
After a heap dump is generated a restart of Confluence is recommended to remove the HeapDumpBeforeFullGC flag and reduce the risk of multiple heap dumps filling up available drive space.
The JVM may be unresponsive and require sending SIGKILL ( kill -9 ) in order to restart Confluence. We recommend this approach if there are more than 3 heap dumps generated by the HeapDumpBeforeFullGC flag
If you're not experiencing performance issues from consistent Full GC thrashing, you can turn off this flag by running the following command:
CONFLUENCE_PID=`ps aux | grep -i confluence | grep -i java | awk -F '[ ]*' '{print $2}'`;
jinfo -flag -HeapDumpBeforeFullGC $CONFLUENCE_PID
You may confirm that the heap dump was successfully generated if the following is present in catalina.out:
Dumping heap to /mnt/attachments/confluence/log/java_pid14333.hprof ...
Heap dump file created [58660820 bytes in 0.194 secs]
If the "file created" message is not present, then the heap might be incomplete and not provide sufficient information. Having the desired outcome, compress the heap dump with something like gzip and provide the file to Atlassian Support for root-cause analysis.
Notes
We strongly advise against leaving HeapDumpBeforeFullGC enabled. Doing so impacts performance and greatly increases the risk of running out of disk space.
While it’s possible to capture a heap manually using jmap it is not recommended as the timing of heap dump collection is crucial- too soon or too late will simply show normal operating behaviour.
A heap dump is not captured automatically in this circumstance even with HeapDumpOnOutOfMemory enabled as garbage collection routines free enough memory to prevent an out of memory error but not enough for the application to respond quickly, or at all, to user’s requests.
If installing the JDK is not an option you may consider setting...
-XX:+HeapDumpBeforeFullGC
...in Confluence's startup parameters, however this option should be removed immediately after a heap dump is generated and before Confluence is restarted.