High memory usage due to leaking ThreadLocal variables from GroovyCustomField
Platform notice: Server and Data Center only. This article only applies to Atlassian products on the Server and Data Center platforms.
Support for Server* products ended on February 15th 2024. If you are running a Server product, you can visit the Atlassian Server end of support announcement to review your migration options.
*Except Fisheye and Crucible
Problem
At large JIRA instances and heavy load GroovyCustomField can leak ThreadLocal variables, that will cause high memory usage and lead to OOM.
Diagnosis
Environment
- ScriptRunner for JIRA version 4.0 - 5.0.14
Diagnostic Steps
From heap dump you can see the following problem:
Class Name | Shallow Heap | Retained Heap | Percentage
------------------------------------------------------------------------------------------------------------------
https-jsse-nio-10.35.136.254-8443-exec-115 Thread
org.apache.tomcat.util.threads.TaskThread @ 0x35d8687b8 | 128 | 110,127,096 | 0.61%
|- java.lang.ThreadLocal$ThreadLocalMap @ 0x6213b4820 | 24 | 110,121,168 | 0.61%
| '- java.lang.ThreadLocal$ThreadLocalMap$Entry[32768] @ 0x360a14568 | 131,088 | 110,121,144 | 0.61%
| |- java.lang.ThreadLocal$ThreadLocalMap$Entry @ 0x5937be410 | 32 | 109,711,184 | 0.61%
| | '- java.util.LinkedHashMap @ 0x5937be430 | 56 | 109,711,152 | 0.61%
| | |- java.util.HashMap$Node[524288] @ 0x38082b4a8 | 2,097,168 | 2,097,168 | 0.01%
| | |- java.util.LinkedHashMap$Entry @ 0x4b3af5040 | 40 | 298,720 | 0.00%
| | |- java.util.LinkedHashMap$Entry @ 0x393ac05e8 | 40 | 216,392 | 0.00%
....
| | |- java.util.LinkedHashMap$Entry @ 0x35d057238 | 40 | 97,344 | 0.00%
| | '- Total: 25 of 247,676 entries; 247,651 more | | |
Note 247k entries for ThreadLocal, and size 100MB. Referent: com.onresolve.scriptrunner.customfield.GroovyCustomField$1
Histogram:
Class Name | Objects | Shallow Heap | Retained Heap |
---|---|---|---|
char[] | 70,599,174 | 9,251,242,216 | >= 9,251,242,216 |
java.lang.String | 70,488,632 | 1,691,727,168 | >= 10,844,317,840 |
java.util.LinkedHashMap$Entry | 20,774,537 | 830,981,480 | >= 8,357,527,664 |
java.util.HashMap$Node | 18,619,919 | 595,837,408 | >= 4,209,784,280 |
java.util.HashMap$Node[] | 2,300,341 | 383,801,528 | >= 4,551,997,344 |
java.util.HashMap | 2,230,648 | 107,071,104 | >= 4,449,251,032 |
So we see LinkedHashMap using 8GB+ data and there are 20M objecst, same structure as above.
If we check specifically ThreadLocal, we can see that they are using 8GB+
Class Name | Objects | Shallow Heap | Retained Heap
------------------------------------------------------------------------------------------------------------
java.lang.ThreadLocal$ThreadLocalMap | 870 | 20,880 | >= 8,695,572,600
|- java.lang.ThreadLocal$ThreadLocalMap$Entry[] | 870 | 8,828,768 |
| |- java.lang.Class | 1 | 0 |
| |- java.lang.ThreadLocal$ThreadLocalMap$Entry | 206,026 | 6,592,832 |
| | |- char[] | 468 | 632,624 | >= 632,624
| | |- com.atlassian.jira.web.filters.ThreadLocalQueryProfiler| 302 | 7,248 | >= 8,916,904
| | |- java.util.LinkedHashMap | 161 | 9,016 | >= 8,516,389,472
Cause
Bug in ScriptRunner:
The problem is that script field values are cached in thread locals, when there is a full reindex done for instance, there will be a long-running thread, and the script field values are not cleared from the thread local.
see related;
https://productsupport.adaptavist.com/browse/SRJIRA-2570
https://productsupport.adaptavist.com/browse/SRJIRA-551
Resolution
Upgrade ScriptRunner to 5.2.2+ version. Note it's compatible with Jira Server 7.2.0+.