Jira applications stall due to StackOverflowError exception

Still need help?

The Atlassian Community is here for you.

Ask the community

Problem

Jira applications can stall when a  java.lang.StackOverflowError exception occurs due to corruption of internal memory structures. 

Symptoms include:

The following appears in the atlassian-jira.log:

2017-11-28 08:21:19,815 http-nio-8080-exec-8 ERROR user 501x750485x1 13esc 10.0.16.1 /secure/AssignIssue.jspa [c.o.scriptrunner.runner.AbstractScriptListener] Script function failed on event: com.atlassian.jira.event.issue.IssueEvent, file: com.org.EpicLinkListener
java.lang.StackOverflowError

Notice that the stack trace of the exception may vary depending on the underlying operation being performed in JIRA

Diagnosis

Environment

  • The problem can be triggered by any bug in the code. The chance of hitting a  StackOverflowError  becomes higher if you have any 3rd party code or write your own code (in case of Groovy from ScriptRunner)

Diagnostic Steps

  • If you check the logs, you can see a large number of StackOverflowErrors


    grep -ic 'StackOverflowError' atlassian-jira.log* 
          63

Cause

The JVM running Jira applications has hit an StackOverflowError triggered by the code. StackOverflowError is an asynchronous exception that can be thrown by the Java Virtual Machine whenever the computation in a thread requires a larger stack than is permitted. The Java Language Specification permits a StackOverflowError to be thrown synchronously by method invocation. This mechanism is a clean way to report that a stack overflow has occurred while preserving the JVM's integrity, but it doesn't provide a safe way for the application to recover from this situation. A stack overflow could occur in the middle of a sequence of modifications which, if not complete, could leave a data structure in an inconsistent state.


Quote from http://openjdk.java.net/jeps/270

For instance, when a StackOverflowError is thrown in a critical section of the java.util.concurrent.locks.ReentrantLock class, the lock status can be left in an inconsistent state, leading to potential deadlocks. The ReentrantLock class uses an instance of AbstractSynchronizerQueue to implement its critical section. 

After an StackOverflowError error, the Jira application will likely be in an unstable state and hence it is essential to restart your Jira applications immediately.

Work Around

Restart Jira 

Resolution

  • Fix the problem in the code which has caused StackOverflowError. The add-on vendor may need to be contacted regarding this
  • Sometimes, It may also be possible to identify the offending component by thoroughly examining the actual stack trace (the application vendor can help with that). While third party applications are not supported by Atlassian, as a best effort here's an example of a case where it was identified that the issue was caused by a 3rd party JQL function PreviousSprint:
    1. The stack trace referenced a class "PreviousSprint":

    at com.onresolve.jira.groovy.jql.plugins.PreviousSprint.getRelevantSprint(PreviousSprint.groovy:27)
    at com.onresolve.jira.groovy.jql.plugins.PreviousSprint$getRelevantSprint.callCurrent(Unknown Source)


    2. It was then identified then that the JQL function "previousSprint" was used on one of the newly created dashboards JQL filters;
    3. The JQL function previousSprint was then disabled, and once the board reloaded, the issue was resolved

That might happen only when that thread encountered a StackOverflowError before it could get to execute unlock() on the ReadLock in the finally block.


Last modified on Sep 29, 2020

Was this helpful?

Yes
No
Provide feedback about this article
Powered by Confluence and Scroll Viewport.