Garbage Collector Performance Issues

This document relates broadly to memory management with Oracle's Hotspot JVM. These are recommendations based on Support's successful experiences with customers and their large Confluence instances.

Please do not use the Concurrent Mark Sweep (CMS) Collector with Confluence, unless otherwise advised by Atlassian Support. It requires extensive manual tuning and testing, and is likely to result in degraded performance.

The information on this page does not apply to

Unable to render {include} The included page could not be found.

.

Summary

Set the Young space up to 30-40% of the overall heap: -XX:NewSize=<between 30% and 40% of your Xmx value, eg, 384m>
Use a parallel collector: -XX:+UseParallelOldGC (make sure this is Old GC)
limit the Tomcat connector's spare thread counts to minimize impact
effectively disable explicit garbage collection triggered from distributed remote clients -Dsun.rmi.dgc.client.gcInterval=900000 -Dsun.rmi.dgc.server.gcInterval=900000
Disable remote clients from triggering a full GC event -XX:+DisableExplicitGC
set the minimum and maximum Xmx and Xms values as the same (eg. -Xms1024m -Xmx1024m) to discourage address map swapping
Turn on GC logging (add the flags -verbose:gc -Xloggc:<confluence-home>/logs/gc.log -XX:+PrintGCTimeStamps -XX:+PrintGCDetails) and submit the logs in a support ticket
1. (Optional) You can enable date stamps: -XX:+PrintGCDateStamps This makes the logs easier to read via the naked eye, but may run into problems if you're trying to use third-party applications to view the GC logs.
Use Java 1.6
Read below if heap > 2G

See Configuring System Properties for how to add these properties to your environment.

Background

Performance problems in Confluence, and in rarer circumstances for JIRA, generally manifest themselves in either:

frequent or infrequent periods of viciously sluggish responsiveness, which requires a manual restart, or, the application eventually and almost inexplicably recovers
some event or action triggering a non-recoverable memory debt, which in turn envelops into an application-fatal death spiral (Eg. overhead GC collection limit reached, or Out-Of-Memory).
generally consistent poor overall performance across all Confluence actions

There are a wealth of simple tips and tricks that can be applied to Confluence, that can have a significantly tangible benefit to the long-term stability, performance and responsiveness of the application.

On this page:

Why Bad Things Happen

Confluence can be thought of like a gel or a glue, a tool for bringing things together. Multiple applications, data-types, social networks and business requirements can be efficiently amalgamated, leading to more effective collaboration. The real beauty of Confluence, however, is its agility to mould itself into your organizations' DNA - your existing business and cultural processes, rather than the other way around - your organization having to adapt to how the software product works.

The flip side of this flexibility is having many competing demands placed on Confluence by its users. Historically, this is an extraordinarily broad and deep set of functions, that really, practically can't be predicted for individual use cases.

The best mechanism to protect the installation is to place Confluence on a foundation where it is fundamentally more resilient and able to react and cope with competing user requirements.

Appreciate how Confluence and the JAVA JVM use memory

The Java memory model is naive. Compared to a unix process, which has four intensive decades of development built into time-slicing, inter-process communication and intelligent deadlock avoidance, the Java thread model really only has 10 years at best under its belt. As it is also an interpreted language, particular idiosyncrasies of the chosen platform Confluence is running can also influence how the JRE reacts. As a result it is sometimes necessary to tune the jvm parameters to give it a "hint" about how it should behave.

There are circumstances whereby the Java JVM will take a mediocre option in respect to resource contention and allocation and struggle along with ofttimes highly impractical goals. For example, The JRE will be quite happy to perform at 5 or 10% of optimum capacity if it means overall application stability and integrity can be ensured. This often translates into periods of extreme sluggishness, which effectively means that the application isn't stable, and isn't integral (as it cannot be accessed).

This is mainly because Java shouldn't make assumptions on what kind of runtime behavior an application needs, but it's plain to see that the charter is to assume 'business-as-usual' for a wide range of scenarios and really only react in the case of dire circumstances.

Memory is contiguous

The Java memory model requires that memory be allocated in a contiguous block. This is because the heap has a number of side data structures which are indexed by a scaled offset (ie n*512 bytes) from the start of the heap. For example, updates to references on objects within the heap are tracked in these "side" data structures.

Consider the differences between:

Xms (the allocated portion of memory)
Xmx (the reserved portion of memory)

Allocated memory is fully backed, memory mapped physical allocation to the application. That application now owns that segment of memory.

Reserved memory (the difference between Xms and Xmx) is memory which is reserved for use, but not physically mapped (or backed) by memory. This means that, for example, in the 4G address space of a 32bit system, the reserved memory segment can be used by other applications, but, because Java requires contiguous memory, if the reserved memory requested is occupied the OS must swap that memory out of the reserved space either to another non-used segment, or, more painfully, it must swap to disk.

Permanent Generation memory is also contiguous. The net effect is even if the system has vast quantities of cumulative free memory, Confluence demands contiguous blocks, and consequently undesirable swapping may occur if segments of requested size do not exist. See Causes of OutOfMemoryErrors for more details.

Please be sure to position Confluence within a server environment that can successfully complete competing requirements (operating system, contiguous memory, other applications, swap, and Confluence itself).

Figure out which (default) collector implementation your vendor is using

Default JVM Vendor implementations are subtly different, but in production can differ enormously.

The Oracle JVM by default splits the heap into three spaces

Young (New, divided into Eden and Survivor)
Tenured (Old)
Permanent Generation (classes & library dependencies)

Objects are central to the operation of Confluence. When a request is received, the Java runtime will create new objects to fulfill the request in the Eden Space. If, after some time, those objects are still required, they may be moved to the Tenured (Old) space. But, typically, the overwhelming majority of objects created die young, within the Eden space. These are objects like method local references within a while or for loop, or Iterators for scanning through Collections or Sets.

But in IBM J9 the default policy is for a single, contiguous space - one large heap. The net effect is that for large Websphere environments, garbage collection can be terribly inefficient - and capable of suffering outages during peak periods.

For larger instances with performance issues, it is recommended to tune Confluence such that there is a large Young space, at up to 50% the overall size of the heap.

-XX:NewSize=XXXm where XXX is the size in megabytes, is the command line parameter. -XmnXXXm can also be used interchangeably. Ie. -XX:NewSize=700m, -Xmn700m

By setting a larger NewSize, the net effect is that the JRE will spend less time garbage collecting, clearing dead memory references, compacting and copying memory between spaces, and more time doing actual work.

Use the Parallel Garbage Collector

Confluence out of the box, and Oracle Java as default, uses the serial garbage collector on the Full Tenured heap. The Young space is collected in parallel, but the Tenured is not. This means that at a time of load if a full collection event occurs, since the event is a 'stop-the-world' serial event then all application threads other than the garbage collector thread are taken off the CPU. This can have severe consequences if requests continue to accrue during these 'outage' periods. As a rough guide, for every gigabyte of memory allocated allow a full second (exclusive) to collect.

If we parallelize the collector on a multi-core/multi-cpu architecture instance, we not only reduce the total time of collection (down from whole seconds to fractions of a second) but we also improve the resiliency of the JRE in being able to recover from high-demand occasions.

Additionally, Oracle provide a CMS, Concurrent Mark-Sweep Collector (-XX:+UseConcMarkSweepGC), which is optimized for higher-throughput, server-grade instances. As a general rule, the Parallel Collector (-XX:+UseParallelOldGC) is the right choice for JIRA or Confluence installations, unless otherwise advised by support.

Restrict ability of Tomcat to 'cache' incoming requests

Quite often the fatal blow is swung by the 'backlog' of accumulated web requests whilst some critical resource (say the index) is held hostage by a temporary, expensive job. Even if the instance is busy garbage collecting due to load, Tomcat will still trigger new http requests and cache internally, as well as the operating system beneath which is also buffering incoming requests in the socket for Tomcat to pick up the next time it gets the CPU.

<Connector port="8090" protocol="HTTP/1.1"
      maxHttpHeaderSize="8192" maxThreads="150" minSpareThreads="25" maxSpareThreads="75" useBodyEncodingForURI="true"
      enableLookups="false" redirectPort="8443" acceptCount="100" connectionTimeout="20000" disableUploadTimeout="true"/>

Here the Tomcat Connector is configured for 150 "maxThreads" with an "acceptCount" of 100. This means up to 150 threads will awaken to accept (but importantly not to complete) web requests during performance outages, and 100 will be cached in a queue for further processing when threads are available. That's 250 threads, many of which can be quite expensive in and of themselves. Java will attempt to juggle all these threads concurrently and become extremely inefficient at doing so, exasperating the garbage collection performance issue.

Resolution: reduce the number of maxThreads and acceptCount to something slightly higher than normal 'busy-hour' demands.

Disable remote (distributed) garbage collection by Java clients

Many clients integrate third-party or their own custom applications to interrogate, or add content to Confluence via its RPC interface. The Distributed Remote Garbage Collector in the client uses RMI to trigger a remote GC event in the Confluence server. Unfortunately, as of this writing, a System.gc() call via this mechanism triggers a full, serial collection of the entire Confluence heap (as it needs to remove references to remote client objects in its own deterministic object graph). This is a deficiency in the configuration and/or implementation of the JVM. It has the potential to cause severe impact if the remote client is poorly written, or operating within a constricted JVM.

This can be disabled by using the flag -XX:+DisableExplicitGC at startup.

Virtual Machines are Evil

Vmware Virtual Machines, whilst being extremely convenient and fantastic, also cause particular problems for Java applications because it's very easy for host operating system resource constraints such as temporarily insolvent memory availability, or I/O swapping, to cascade into the Java VM and manifest as extremely unusual, frustrating and seemingly illogical problems. We already document some disk I/O metrics with VMware images. Although we now officially support the use of virtual instances we absolutely do not recommend them unless maintained correctly.

This is not to say that vmware instances cannot be used, but, they must be used with due care, proper maintenance and configuration. Besides, if you are reading this document because of poor performance, the first action should be to remove any virtualization. Emulation will never beat the real thing and always introduces more black box variability into the system.

Use Java 1.6

Java 1.6 is generally regarded via public discussion to have an approximate 20% performance improvement over 1.5. Our own internal testing revealed this statistic to be credible. 1.6 is compatible for all supported versions of Confluence, and we strongly recommend that installations not using 1.6 should migrate.

Use -server flag

The hotspot server JVM has specific code-path optimizations which yield an approximate 10% gain over the client version. Most installations should already have this selected by default, but it is still wise to force it with -server, especially on some Windows machines.

If using 64bit JRE for larger heaps, use `CompressedOops`

For every JDK release, Oracle also build a "Performance" branch in which specifically optimized performance features can be enabled; it is available on the Java SE page after a brief survey. These builds are certified production grade.

Some blogs have suggested a 25% performance gain and a reduction in heap size when using this parameter. The use and function of the -XX:+UseCompressedOops parameter is more deeply discussed on Oracle's Official Wiki (which itself uses Confluence!)

Use NUMA if on SPARC, Opteron or recent Intel (Nehalem or Tukwila onwards)

-XX:+UseNUMA flag enables the Java heap to take advantage of Non-Uniform-Memory-Architectures. JAVA will place data structures relevant to the thread which it owns / operates on, in memory locations closest to that particular processor. Depending on the environment, gains can be substantial. Intel market NUMA as Quick Path Interconnect™.

Use 32bit JRE if Heap < 2GB

Using a 64bit JRE when the heap is under 2GB will cause substantial degradation in heap size and performance. This is because nearly every object, reference, primitive, class and variable will use twice as much memory to be addressed.

A 64bit JRE/JDK is only recommended if heaps greater than 2GB are required. If so, use CompressedOops.

JVM core dumps can be instigated by memory pressures

If your instance of Confluence is throwing Java core dumps, it's known that memory pressure and space/generation sizings can influence the frequency and occurrence of this phenomena.

If your Tomcat process completely disappears and the logs record similar to:

#
# An unexpected error has been detected by HotSpot Virtual Machine:
#
#  SIGSEGV (0xb) at pc=0xfe9bb960, pid=20929, tid=17
#
# Java VM: Java HotSpot(TM) Server VM (1.5.0_01-b08 mixed mode)
# Problematic frame:
# V  [libjvm.so+0x1bb960]
#

---------------  T H R E A D  ---------------

Current thread (0x01a770e0):  JavaThread "JiraQuartzScheduler_Worker-1" [_thread_in_vm, id=17]

siginfo:si_signo=11, si_errno=0, si_code=1, si_addr=0x00000000

Registers:
 O0=0xf5999882 O1=0xf5999882 O2=0x00000000 O3=0x00000000
 O4=0x00000000 O5=0x00000001 O6=0xc24ff0b0 O7=0x00008000
 G1=0xfe9bb80c G2=0xf5999a48 G3=0x0a67677d G4=0xf5999882
 G5=0xc24ff380 G6=0x00000000 G7=0xfdbc3800 Y=0x00000000
 PC=0xfe9bb960 nPC=0xfe9bb964

then you should upgrade the JVM. See (Archived) SIGSEGV Segmentation Fault JVM Crash.

Artificial Windows memory limit

On Windows, the maximum heap allocatable to the Tomcat 32bit wrapper process is around 1400MB. If the instance is allocated too close to this limit, chronic garbage collection is likely to result, often producing JAVA core dumps similar to:

#
# A fatal error has been detected by the Java Runtime Environment:
#
# java.lang.OutOfMemoryError: requested 8388608 bytes for GrET in C:\BUILD_AREA\jdk6_18\hotspot\src\share\vm\utilities\growableArray.cpp. Out of swap space?
#
#  Internal Error (allocation.inline.hpp:39), pid=11572, tid=12284
#  Error: GrET in C:\BUILD_AREA\jdk6_18\hotspot\src\share\vm\utilities\growableArray.cpp
#
# JRE version: 6.0_18-b07
# Java VM: Java HotSpot(TM) Server VM (16.0-b13 mixed mode windows-x86 )
# If you would like to submit a bug report, please visit:
#   http://bugreport.sun.com/bugreport/crash.jsp
#

---------------  T H R E A D  ---------------

Current thread (0x002af800):  GCTaskThread [stack: 0x00000000,0x00000000] [id=12284]

or,

#
# A fatal error has been detected by the Java Runtime Environment:
#
# java.lang.OutOfMemoryError: requested 123384 bytes for Chunk::new. Out of swap space?
#
#  Internal Error (allocation.cpp:215), pid=10076, tid=4584
#  Error: Chunk::new
#
# JRE version: 6.0_18-b07
# Java VM: Java HotSpot(TM) Server VM (16.0-b13 mixed mode windows-x86 )
# If you would like to submit a bug report, please visit:
#   http://bugreport.sun.com/bugreport/crash.jsp
#

---------------  T H R E A D  ---------------

Current thread (0x6ca4d000):  JavaThread "CompilerThread1" daemon [_thread_in_native, id=4584, stack(0x6cd10000,0x6cd60000)]

Workarounds include:

changing the server OS to something other than Windows. For example, Linux
switching to the 64 bit Tomcat wrapper (this is not supported)
reducing memory allocation to the Tomcat process. Try backing off 100MB at a time and observe the results.

Instigate useful monitoring techniques

At all times the best performance tuning recommendations are based on current, detailed metrics. This data is easily available and configurable and helps us tremendously at Atlassian when diagnosing reported performance regressions.

enable JMX monitoring
enable Confluence Access logging
enable Garbage Collection Logging
Take Thread dumps at the time of regression. If you can't get into Confluence, you can take one externally.
Jmap can take a memory dump in real time without impacting the application. Syntax: jmap -heap:format=b <process_id>

Great tools available include:

The excellent VisualVM, documentation.
Thread Dump Analyzer - a great all-round thread debugging tool, particularly for identifying deadlocks.
Samurai, an excellent alternative thread analysis tool, good for iterative dumps over a period of time.
GC Viewer - getting a bit long in the tooth, but is a good mainstay for GC analysis.
GChisto - A GC analysis tool written by members of the Sun Garbage Collection team.

Documentation:

Sun's White Paper on Garbage Collection in Java 6.
Sun's state-of-the-art JavaOne 2009 session on garbage collection (registration required).
IBM stack: Java 5 GC basics for WebSphere Application Server.
An Excellent IBM document covering native memory, thread stacks, and how these influence memory constricted systems. Highly recommended for additional reading.
The complete list of JRE 6 options
I strongly recommend viewing George Barnett's Summit 2010 performance presentation, Pulling a Rabbit from a Hat.

Atlassian recommends at the very least to get VisualVM up and running (you willneed JMX), and to add Access and Garbage Collection logging.

Tuning the frequency of full collections

The JVM will generally only collect on the full heap when it has no other alternative, because of the relative size of the Tenured space (it is typically larger than the Young space), and the natural probability of objects within tenured not being eligible for collection, i.e. they are still alive.

Some installations can trundle along, only ever collecting in Young space. As time goes on, some object will survive the initial Young object collection and be promoted to Tenured. At some point, it will be dereferenced and no longer reachable by the deterministic, directed object graph. However, the occupied memory will still be held in limbo as "dead" memory until a collection occurs in the Tenured space to clear and compact the space.

It is not uncommon for moderately sized Confluence installations to reclaim as much as 50% of the current heap size on a full collection; This is because full collections occur so infrequently. By reducing the occupancy fraction heap trigger, this means that more memory will be available at any time, meaning that fewer swapping/object collections will occur during the busy hour.

Atlassian would classify frequency tuning on collections as an advanced topic for further experimentation, and is provided for informational purposes only. Unfortunately, it's impractical for Atlassian to support these kinds of changes in general.

Performance tuning works

Atlassian has a number of high profile and some extremely high demanding, mission-critical clients who have successfully, usually through trial and error, applied these recommendations to production instances and have significantly improved their instances. For more information, please file a support case at support.atlassian.com.

Page tree