Garbage Collection (GC) Tuning Guide
What GC performance tuning can't do
Tuning GC performance will only get you so far. There will come a point where there are no further gains to be made by making changes, i.e. you have reached the performance limitations of the environment. If this happens and you are still short of your goals, you will need to consider changes beyond GC tuning, such as obtaining more powerful hardware, OS tuning and application tuning.
You should also be aware that by supplying explicit tunings you may actually degrade your performance. It is important to continually monitor your application and check that the assumptions you based your tuning on still hold.
1. Choosing Performance Goals
In order to tune GC performance, you first need to choose goals. You will set values for these goals in the next step, when you prepare your system for GC tuning.
In this section:
GC performance goals
Garbage collection with the Oracle HotSpot VM can be reduced to the following three goals:
- Latency — Pauses induced by the JVM as it performs garbage collection.
- Throughput — The percentage of wall-clock time the JVM has available for executing the application.
- Footprint — The allocated heap.
Latency forms part of the time that users of the application will be waiting for responses to their queries, e.g. visiting the dashboard of Jira, searching for issues. Specifically, for the purposes of GC tuning; it is the time that the JVM is paused and unable to execute the application. There are two main measurements, the mean GC latency and the maximum latency. The mean latency will give you an indication as to what the usual GC pause will be, whereas the maximum will indicate the maximum pause time to expect. The motivation for this goal is usually related to client-perceived performance, or responsiveness. Another consideration is if the application is queried by other systems, such that connection timeouts or processing delays impact other systems.
Latency is expressed in seconds, and for the purposes of tuning; the two major latency goals are:
- Mean pause time
- Maximum pause time
Throughput is the percentage of time available to the application for execution. The more time available to execute the application, the more processing time it has available to service requests. It should be noted that high throughput and low latency are not necessarily related — high throughput may accompany long, yet infrequent pause times.
Footprint is the amount of memory that the JVM will consume in order to execute the application. This is usually important if your environment is memory constrained.
Oracle HotSpot VM garbage collection principles
To reach your goals, there are three principles that provide guidance on tuning the HotSpot VM for GC performance:
- Minor GC reclamation — Lower cost for minor GC reclamation
- GC maximize memory— More memory the "better"
- Two out of three — Pick two of the three performance goals
Minor GC reclamation
This is predicated on generational garbage collection, where it is assumed:
- Most memory allocations are short lived.
- Most old objects reference only old objects.
- Most old objects are long lived.
- GC collection cost is proportional to the set of live objects.
In the majority of applications, most garbage is created from short-lived recent object allocations.
This set of assumptions imply that it is more efficient to split GC work between memory areas for new allocations and old allocations. The new area (or generation) can be processed in a manner which is faster when there are less live objects at the end, as opposed to the old generation having generally more live objects at the end of a collection.
What does this mean?
The less short-lived allocations that are promoted or tenured from the new to the old generation and conversely the more long-lived allocations that are tenured from new to old, the more efficient the overall garbage collection of the JVM. This leads to higher throughput.
GC maximize memory principle
If we have infinite memory, we don't need to collect garbage!
The more memory you give the JVM, the lower the collection frequency. In addition, it also means that the new generation can be sized appropriately to better cope with the rate of creation of short-lived objects. This reduces the number of allocations promoted to the old generation.
Two out of three principle
To make things easier, it is recommended that you only pick two of the performance goals to tune for and sacrifice the other. For greater ease, pick one. Often the goals are competing, for example the more memory you give the heap for better throughput, it is likely that the mean pause times will be longer; conversely, if you give the heap less memory thus reducing the mean pause time, the pause frequency will likely increase and reduce throughput. Similarly for heap sizing, if the memory for all generations and sub-generational areas are sized appropriately providing better latency and throughput, this is usually at the expense of a JVM footprint.
In short, tuning GC is a balancing act. You may not be able to achieve all goals through GC tuning alone.
2. Preparing your environment for GC tuning
Once you have selected your goals, you will need to prepare your environment for GC tuning. An outcome of this step will be values for the goals you have selected. Together, the goals and values will become the systemic requirements of the environment that you are tuning for.
In this section:
Load the application with work
Before you can measure the GC performance of the JVM executing a particular application, you need to be able to have the application perform work and reach a steady state. This is done by having load applied to the application. Loading the application with work is not in the scope of this guide. We strongly recommend that the load be modeled on the steady state load you want to tune GC for, i.e. a load that reflects the usage patterns and quantity of use you expect to see when the application is used in a production environment.
As you may need to make iterative changes to the JVM tuning parameters, we recommend that this be done in an environment that is as close to the production environment as possible, (e.g. similar hardware, OS version, load profile), primarily to eliminate the error in your tunings and to minimize disruption to users. The process may require a number of iterations to achieve the desired results, so it is worth considering any disruption to users if the system is already in production.
Turn on GC logging
Oracle VM only
Please note, subsequent JVM parameters are for the Oracle 1.6 JVM only. Other implementations may not share these specific parameters.
See your JVM documentation for more details.
In order to measure the GC performance of the JVM, you need to turn on GC logging. This is done via command line flags passed to the JVM on startup. For example;
java -XX:+PrintGC -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -Xloggc:<file> …
We strongly suggest using a separate GC log file to separate the GC diagnostics from the application logs. If you would prefer timestamps with the date and time, as opposed to the offset since VM launch, use
Also, we recommend to use
$JIRA_HOME/log directory (expanded to the absolute path for your installation) for the GC log file location, so that it will be automatically included in Support ZIPs.
Set data sampling windows
It is important when reviewing sampling windows from the GC logs, that these periods represent when the application was operating at the target load you are tuning for. Additionally, there may also be a period of time that the application and JVM needs in order to reach this steady state. For example, it may take a few minutes for the application to bootstrap before being ready to service requests, or half an hour before all the long-lived caches are primed (as this would become a part of the memory footprint).
You will need to exercise your own judgment on what intervals represent a steady state of the application in the environment you are tuning for.
Determine memory footprint
In order to tune the JVM generation sizes, you need to have a good idea of what the steady state live data set size is. You can get this in one of two ways:
Derive it from GC logs
This presupposes that you are able to currently run the application in a VM and have it reach the steady state. If you are not able to do this, try the following:
Give the JVM as much maximum heap size as possible without causing memory to be swapped to disk.
Use the throughput collector (
-XX:+UseParallelOldGC) with default parameters.
Run the JVM and check that there are no memory-related errors such as a
java.lang.OutOfMemoryError. If there are, and assuming that as much memory as possible has been given to the heap, then you will need to change parameters external to the JVM, such as increasing the amount of system memory.
Once you have the application running in a steady state, you will need to estimate the memory footprint from the average (mean) old and permanent generation occupancy after full GCs. Here is an example Full GC log line with the old generation and permanent generation occupancy, post GC, indicated as
773.192: [Full GC [PSYoungGen: 299756K->0K(6552704K)] [ParOldGen: 11486930K->6622879K(11657024K)] 11786686K->6622879K(18209728K) [PSPermGen: 161761K->160673K(247808K)], 43.4385540 secs] [Times: user=181.10 sys=1.68, real=43.44 secs] ^^^^^^^ ++++++
Find the mean of these fields during any and all steady state periods. This is the mean memory footprint for the old and permanent generations.
For example, from the following Full GC lines using a single time window, offset at 300 seconds with a 3600 second duration:
287.223: [Full GC [PSYoungGen: 682071K->0K(9521152K)] [ParOldGen: 1709104K->862326K(1929728K)] 2391175K->862326K(11450880K) [PSPermGen: 164815K->164809K(262144K)], 1.3367950 secs] [Times: user=7.12 sys=0.07, real=1.34 secs] 1125.789: [Full GC [PSYoungGen: 1069485K->0K(10115264K)] [ParOldGen: 2026464K->1577765K(2659264K)] 3095949K->1577765K(12774528K) [PSPermGen: 165949K->165940K(262144K)], 2.9698460 secs] [Times: user=14.42 sys=0.04, real=2.97 secs] 2204.512: [Full GC [PSYoungGen: 1026559K->0K(10101568K)] [ParOldGen: 2710240K->1757747K(2980288K)] 3736799K->1757747K(13081856K) [PSPermGen: 166249K->166242K(262144K)], 3.2056200 secs] [Times: user=15.50 sys=0.02, real=3.21 secs] 2265.623: [Full GC [PSYoungGen: 410046K->0K(10115584K)] [ParOldGen: 2762176K->1926377K(3520000K)] 3172222K->1926377K(13635584K) [PSPermGen: 166243K->166242K(262144K)], 4.1453870 secs] [Times: user=20.11 sys=0.03, real=4.14 secs] 2312.878: [Full GC [PSYoungGen: 580541K->0K(10115520K)] [ParOldGen: 3448192K->2398541K(4246336K)] 4028733K->2398541K(14361856K) [PSPermGen: 166243K->166243K(262144K)], 5.0057740 secs] [Times: user=25.06 sys=0.03, real=5.01 secs] 2406.074: [Full GC [PSYoungGen: 442182K->0K(9997056K)] [ParOldGen: 4059865K->2220928K(4308544K)] 4502047K->2220928K(14305600K) [PSPermGen: 166245K->166244K(262144K)], 4.6421270 secs] [Times: user=22.88 sys=0.04, real=4.65 secs] 2492.633: [Full GC [PSYoungGen: 551513K->0K(10058560K)] [ParOldGen: 4035620K->2219019K(4526080K)] 4587134K->2219019K(14584640K) [PSPermGen: 166249K->166249K(251264K)], 4.4630140 secs] [Times: user=22.36 sys=0.01, real=4.46 secs] 2551.866: [Full GC [PSYoungGen: 757661K->0K(9941888K)] [ParOldGen: 4285737K->2455810K(4991616K)] 5043398K->2455810K(14933504K) [PSPermGen: 166259K->166258K(239424K)], 4.5891680 secs] [Times: user=23.50 sys=0.03, real=4.59 secs] 2587.593: [Full GC [PSYoungGen: 180557K->0K(10140096K)] [ParOldGen: 4554339K->2111979K(4954688K)] 4734896K->2111979K(15094784K) [PSPermGen: 166264K->166264K(229056K)], 4.5708050 secs] [Times: user=22.55 sys=0.02, real=4.57 secs] 2717.263: [Full GC [PSYoungGen: 286857K->0K(9503424K)] [ParOldGen: 4726275K->2537156K(5506560K)] 5013132K->2537156K(15009984K) [PSPermGen: 166271K->166271K(219712K)], 6.0445180 secs] [Times: user=28.51 sys=0.03, real=6.04 secs] 2794.947: [Full GC [PSYoungGen: 601040K->0K(9763968K)] [ParOldGen: 5454180K->2849912K(6176064K)] 6055221K->2849912K(15940032K) [PSPermGen: 166272K->166272K(211968K)], 6.1950910 secs] [Times: user=30.35 sys=0.05, real=6.20 secs] 2841.383: [Full GC [PSYoungGen: 544562K->0K(10043968K)] [ParOldGen: 5736088K->3054858K(6731904K)] 6280651K->3054858K(16775872K) [PSPermGen: 166274K->166273K(204800K)], 6.5399570 secs] [Times: user=32.23 sys=0.04, real=6.54 secs] 6663.915: [Full GC [PSYoungGen: 1080358K->0K(10062592K)] [ParOldGen: 6735458K->2460384K(5908544K)] 7815817K->2460384K(15971136K) [PSPermGen: 166399K->166398K(199168K)], 7.2638840 secs] [Times: user=29.06 sys=0.06, real=7.27 secs]
The old and permanent generation means are:
There is a caveat here — if the application being run within the JVM aggressively caches and this cache size is proportional to the heap or old generation size, then the mean old generation size is likely to increase when the heap or old generation increases. This will lead to inflated heap sizes due to caching.
Rules of thumb for generation sizes
There are a number of rules of thumb for sizing the different generations, which are based on the memory footprint:
The maximum heap size should be between 3x – 4x the old generation mean.
The old generation should not be less than 1.5x the old generation mean.
The permanent generation should not be less than 1.5x permanent generation mean.
The new generation should not be less than 10% of the entire heap. This is only important if you manually set the size of the new generation.
When switching to the mostly concurrent garbage collector, increase the size of the old and permanent generations by 20% (assuming that permanent generation CMS is enabled).
Don't exceed the amount of physical memory available when resizing the JVM (there is more memory consumed by the JVM than just the heap). This will avoid virtual memory thrashing, or worst case; JVM process termination.
Set the initial heap size
Now that you have calculated the memory footprint, you can then set the initial heap sizes for the tuning exercise:
4x Old Generation Mean
Old Generation Size:
1.5x Permanent Generation Mean
Using the example from memory footprint, here are the resulting initial heap size parameters:
Determine systemic requirements
Back in performance goals, the three performance tuning goals for VM GC tuning were discussed. You now need to determine values for these goals. They represent the systemic requirements of the environment that you are tuning GC performance for.
Some systemic requirements to determine are:
Acceptable mean minor GC pause time in seconds
Acceptable mean full GC pause time in seconds
Maximum tolerable full GC pause time in seconds
Acceptable minimum throughput expressed as a percentage of time
Generally, if you are focusing on the Latency or Footprint goals, then you would prefer small values and setting maximum tolerances for pause times with smaller values for throughput; conversely, if you are focusing on Throughput you would prefer larger values for pause times without maximum tolerances and larger values for throughput.
It is okay if you don't have a good grasp as to what you need here. You can always perform the tuning exercise again after manually testing the application to determine the application's responsiveness within the tuning environment.
Following on from the example in initial heap size, here are two sets of synthetic systemic requirements that will be used in further examples:
Synthetic systemic requirements for throughput:
Acceptable mean minor GC pause time:
Acceptable mean full GC pause time:
Acceptable minimum throughput:
Synthetic systemic requirements for latency:
Acceptable mean minor GC pause time:
Maximum tolerable full GC pause time:
Acceptable minimum throughput:
3. Understanding the Throughput Collector and Behavior-based tuning
Generally, the majority of GC scenarios for Atlassian products can be resolved using the Oracle HotSpot VM's throughput collector (
-XX:+UseParallelOldGC). However, when hard maximum pause time tolerances or very high throughput are required; manual tuning (beyond what is outlined in this guide) will be needed.
In order to solve the majority of GC scenarios, the HotSpot VM employs behavior-based tuning with the parallel collectors (this includes the mostly-concurrent mark-sweep collector). The parallel collectors are designed to keep three goals in check — these goals are, in descending order of priority:
- Maximum pause time
The intrepid will notice that these behaviors map to the performance goals. These goals are assessed at each collection in order of priority. Once a goal is determined to not be met, the heap size will be adapted to meet the failed goal and the remaining goals are not assessed for that collection. If you are interested in observing this behavior, then use the
-XX:+PrintAdaptiveSizePolicy command line flag for the JVM.
In this section:
Behavior-based tuning goals
Maximum pause time goal
With the maximum pause time goal, each generation (new and old) keeps track of their average pause time value and if it exceeds this goal, the particular generation size is reduced. There is no default maximum pause time goal set. To set the maximum pause time goal, the following flags can be used independently (set both if you have differing pause time goals for minor and full collections):
|For both minor and full collections:|
|For just the minor collections:|
Note that this doesn't give a guarantee on what the maximum pause time will be for any GC event.
The throughput goal is a single measurement of collections for both the young and old generations. Throughput is determined as the percentage of time the VM spends executing the application versus time spent inside the VM performing garbage collection. If this goal is not being met, then the generation sizes are increased with the assumption that larger generations take longer to fill. The default goal is 99% application time, 1% GC time. See
-XX:GCTimeRatio=<n>. The formula for determining this ratio of GC time to application time is as follows:
f(n) = 1 / (1 + n)
Where n is the target application time percentage, e.g. the default value of
99 evaluates to:
f(99) = 1 / (1 + 99) = 1 / (100) = 0.01 = 1%
Here is a graph representing the relationship between the
GCTimeRatio and the resulting throughput goal and a Python script to print the throughput percentage for all
def f(x): return 1 - (1 / (1.0 + x)) i = 1 while i <= 99: print "%d\t%.02f %%" % (i, f(i) * 100) i += 1
In a similar manner to pause time measurements, the throughput goal is calculated against an average weighted throughput before resizing the heap.
With the footprint goal, if the previous two goals are met; then the garbage collector will reduce the size of the heap until one of the previous goals is no longer met. This is assuming that the heap size hasn't been fixed in place by explicitly setting the maximum and minimum sizes to the same value, e.g.
Interactions between goals
With behavior-based tuning bear the following in mind, the goals will only adjust GC parameters if and when a goal is not met. As an example, for the Maximum Pause Time Goal; the generation will not be resized until the weighted average pause time for that generation is larger than this value. Hence, the maximum tolerable full GC pause time may be exceeded for behavioral tuning to begin reducing heap sizes.
As there is always a Throughput Goal, if the heap sizes have been reduced from the pause time being too high; it is likely (with the default throughput goal) that with successive collections the heap sizes will then be increased to attempt to meet the throughput goal. This may eventually cause the maximum pause time goal to fail again, causing the throughput collector to oscillate between meeting these two goals.
4. Time to tune!
By this stage, you have done the following preparation:
- You can generate load.
- You know what you want to tune for.
- You have initial heap parameters.
- You have selected the Throughput Collector.
You are good to go, get tuning!
In general, the workflow for GC tuning works like this:
- Determine desired behavior.
- Measure behavior before change.
- Determine indicated change.
- Make change.
- Measure behavior after change.
- If behavior not met, try again.
In the instructions below, you will first measure GC performance (latency, throughput). You will then compare it to your systemic requirements. Finally, we'll give you some examples of how to re-tune to meet requirements.
In this section:
Calculating minor GC latency
When using the throughput collector with the GC logging flags suggested above, a minor GC event will look like the following:
37.668: [GC [PSYoungGen: 5245632K->167562K(6119872K)] 5245632K->167562K(17776832K), 0.0840040 secs] [Times: user=1.05 sys=0.11, real=0.08 secs] ^^^^^^^^^
Calculating full GC latency
Assuming the same GC logging flags, a full GC event will look like the following (again with the pertinent field underlined with '^' characters):
773.192: [Full GC [PSYoungGen: 299756K->0K(6552704K)] [ParOldGen: 11486930K->6622879K(11657024K)] 11786686K->6622879K(18209728K) [PSPermGen: 161761K->160673K(247808K)], 43.4385540 secs] [Times: user=181.10 sys=1.68, real=43.44 secs] ^^^^^^^^^^
Measuring application throughput (% of time the JVM has available for concurrent application execution) is somewhat trickier, but is essentially the percentage of time for a given sample window the VM was not paused. The GC logs do not give you this directly but it can be calculated by determining the percentage of time elapsed for all the GC events (minor and full). Given that with the throughput collector the total time taken in each GC event indicates how long the VM was paused and not executing the application, all that has to be done for a given sample window is to sum the duration for each minor and full GC event.
Comparing measurements to systemic requirements
Now that you have measured the GC performance characteristics, these can be compared to the systemic requirements. If your systemic requirements are not met with the default tuning of the throughput collector, then set the tuning parameters appropriately and re-run a load test. Check again, if the results are good enough, then you're done!
If however, the results are still not meeting your requirements you have a few options:
- Review your requirements, change as necessary and re-test.
- Change factors external to JVM garbage collection, for example:
- More/faster RAM.
- More/faster CPU cores.
- Newer supported versions of the JVM.
- Changes to the application(s) being run in the JVM.
- Perform finer GC tuning, which we'll cover in another tuning guide in greater detail. In that in-depth guide, we'll aim to investigate further manipulation of heap and generation sizes, tenuring thresholds, the use of the mostly concurrent mark-sweep collector and the use of platform/environment specific tuning parameters.
Following on from the synthetic requirements above, here are the GC performance measurements from an example load test with the default behavioral tuning parameters (see Garbage Collection (GC) Tuning Guide, using an offset of 300 seconds with a duration of 900 seconds for the sample window):
|Mean minor GC pause time:||0.13 seconds|
|Maximum full GC pause time:||3.58 seconds|
From these measurements, neither the latency or throughput goals are fully met. Hence, we need to change the behavior-based tuning parameters to suit what we need and then re-run the test.
Example re-tune for throughput
The throughput value measured during the sampling window after re-tuning is less than required. Given that the default throughput goal is for 99 % application throughput and that only 88.27 % was reached during the sampling window indicates that with the configured heap parameters that the goal couldn't be met. We are now faced with options like the ones suggested in Comparing measurements to systemic requirements. Ideally, it is best to exhaust the manual tuning option to further attempt to reach the goal. As this is out of scope for this guide, we'll opt for reviewing the requirements.
As an example, while the load test was running at steady state, some manual experiments of the application under test showed that responsiveness and perceived throughput were good enough to satisfy the stakeholders of the system. Hence, the requirements were lowered, indicating that the requirements set were higher than needed to get the required performance results.
Example re-tune for latency
In this case, throughput was greater than required and the maximum full GC pause time was less than the maximum tolerable, but the mean minor GC pause time breached the required limit. To service this goal, we will set the pause times goals for both full and minor collections. To reduce the oscillation between the pause time and throughput goals, the throughput goal will be clamped to the first ratio where the resulting throughput percentage is greater the requirement. The resulting JVM options are:
-XX:MaxGCPauseMillis=10000 -XX:MaxGCMinorPauseMillis=100 -XX:GCTimeRatio=2
After adding the new GC tuning parameters and re-running the load test, here are the performance measurements (see Garbage Collection (GC) Tuning Guide, using an offset of 300 seconds with a duration of 900 seconds for the sample window):
|Mean minor GC pause time:||0.06 seconds|
|Maximum full GC pause time:||5.33 seconds|
All three systemic requirements are now met.
There are some useful tools for interpreting GC performance with the Oracle HotSpot VM. We recommend the following two tools:
- Java VisualVM with the Visual GC plugin, for live GC telemetry. Check your JDK's
jvisualvmor visit the website.
- Chewiebug's GCViewer. This tool is good for post-hoc review of GC performance based on a GC log file.
- C Hunt and B John, Java™ Performance, Addison-Wesley, 2011
- Sun Microsystems, Memory Management in the Java HotSpotTM Virtual Machine, Sun Microsystems, 2006, retrieved 14 March 2012, <http://java.sun.com/j2se/reference/whitepapers/memorymanagement_whitepaper.pdf>
- Oracle Technology Network, Java HotSpot VM Options, Oracle Corporation, 2012, retrieved 14 March 2012, <http://www.oracle.com/technetwork/java/javase/tech/vmoptions-jsp-140102.html>
- Oracle Technology Network, Java SE 6 HotSpot[tm] Virtual Machine Garbage Collection Tuning, Oracle Corporation, 2012, retrieved 14 March 2012, <http://www.oracle.com/technetwork/java/javase/gc-tuning-6-140523.html>
- OpenJDK, jdk/jdk6/hotspot, Oracle Corporation, 2012, retrieved 14 March 2012, <http://hg.openjdk.java.net/jdk6/jdk6/hotspot>