Confluence unresponsive in virtual environment during garbage collection
Platform Notice: Data Center - This article applies to Atlassian products on the Data Center platform.
Note that this knowledge base article was created for the Data Center version of the product. Data Center knowledge base articles for non-Data Center-specific features may also work for Server versions of the product, however they have not been tested. Support for Server* products ended on February 15th 2024. If you are running a Server product, you can visit the Atlassian Server end of support announcement to review your migration options.
*Except Fisheye and Crucible
Problem
Confluence becomes unresponsive for an extended period of time seemingly due to garbage collection.
Diagnosis
In the log files for garbage collection, lines similar to the following will be found where the 'real' time taken to complete a full GC is significantly longer then the 'user' and 'sys' time.
604483.041: [Full GC [PSYoungGen: 6423K->0K(330816K)] [ParOldGen: 696445K->351646K(699072K)] 702869K->351646K(1029888K) [PSPermGen: 215670K->213733K(262144K)], 1044.1092830 secs] [Times: user=5.99 sys=7.58, real=1044.05 secs]
Cause
This is likely caused by a process called memory ballooning in Virtual environments. What is happening is that the information in physical memory in use by the virtual machine is written out to a hard drive so that the same physical memory can be used by another virtual machine. The result for Confluence is a long time waiting for the system to become available while all of the stored information is read off of the hard drive and back into physical memory so that garbage collecting can be done for the JVM.
Resolution
Disable memory ballooning for the virtual machine hosting Confluence, following the steps described in this KB article: https://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1002586