Automation for Jira - The automation executor threads are stuck because of the usage of a 3rd party add-on
Platform notice: Server and Data Center only. This article only applies to Atlassian products on the Server and Data Center platforms.
Support for Server* products ended on February 15th 2024. If you are running a Server product, you can visit the Atlassian Server end of support announcement to review your migration options.
*Except Fisheye and Crucible
Summary
This article describes a few scenarios where all rules from Automation for Jira (A4J) stop being triggered due to a 3rd party add-on blocking the threads, how to identify them, and how to fix them.
The observed symptoms are the following:
- Various A4J rules are configured to be triggered when a new issue is created or updated
- For all these rules, they stopped being triggered at the same time. For example, the rules that should be triggered on the "Create Issue" event don't show anything in the Audit log even though new issues were recently created:
- Note that, for any A4J rule configured with a trigger that has the option Execute this rule immediately when the rule is triggered ticked, this rule gets triggered. However, if this option is unticked (which is the default configured), such rule is not triggered
Environment
- Jira Server/Data Center 8.0.0 an any higher version.
- Automation for Jira (A4J) 7.0.0 and any higher version
Scenario 1
Diagnosis
When the following SQL query is executed to check the size of the A4J queue, we can see that the events keep piling up in the queue without being processed:
select count (*) from "AO_589059_AUTOMATION_QUEUE"; count ------- 47423 (1 row)
- When collecting thread dumps while the issue is happening, we can make the following observations:
- All the threads automation-rule-executor:thread-X which are responsible to process events from the queue and trigger the rules are stuck in the Blocked status
- There is a deadlock
A4J rule executor threads 1, 2, 3, 4, 5 are in the Blocked state, waiting for the same lock (<0x00000000154fe597> in the example below):
"automation-rule-executor:thread-1" daemon prio=5 tid=0x00000000000d3cf0 nid=0 waiting for monitor entry java.lang.Thread.State: BLOCKED (on object monitor) at java.base@11.0.13/java.util.Vector.hashCode(Unknown Source) - waiting to lock <0x00000000154fe597> (a java.util.Vector) owned by automation-rule-executor:thread-6 id=0x00000000000d3d20 "automation-rule-executor:thread-2" daemon prio=5 tid=0x00000000000d3cfa nid=0 waiting for monitor entry java.lang.Thread.State: BLOCKED (on object monitor) at java.base@11.0.13/java.util.Vector.listIterator(Unknown Source) - waiting to lock <0x00000000154fe597> (a java.util.Vector) owned by automation-rule-executor:thread-6 id=0x00000000000d3d20 at java.base@11.0.13/java.util.AbstractList.equals(Unknown Source) at java.base@11.0.13/java.util.Vector.equals(Unknown Source) - locked <0x000000005201b552> (a java.util.Vector) at org.apache.fop.util.CompareUtil.equal(CompareUtil.java:38) "automation-rule-executor:thread-3" daemon prio=5 tid=0x00000000000d3cfb nid=0 waiting for monitor entry java.lang.Thread.State: BLOCKED (on object monitor) at java.base@11.0.13/java.util.Vector.hashCode(Unknown Source) - waiting to lock <0x00000000154fe597> (a java.util.Vector) owned by automation-rule-executor:thread-6 id=0x00000000000d3d20 "automation-rule-executor:thread-4" daemon prio=5 tid=0x00000000000d3cfc nid=0 waiting for monitor entry java.lang.Thread.State: BLOCKED (on object monitor) at java.base@11.0.13/java.util.Vector.hashCode(Unknown Source) - waiting to lock <0x00000000154fe597> (a java.util.Vector) owned by automation-rule-executor:thread-6 id=0x00000000000d3d20 "automation-rule-executor:thread-5" daemon prio=5 tid=0x00000000000d3d1f nid=0 waiting for monitor entry java.lang.Thread.State: BLOCKED (on object monitor) at java.base@11.0.13/java.util.Vector.hashCode(Unknown Source) - waiting to lock <0x00000000154fe597> (a java.util.Vector) owned by automation-rule-executor:thread-6 id=0x00000000000d3d20
the 6th A4J rule executor thread (automation-rule-executor:thread-6) is holding that lock <0x00000000154fe597> that the other A4J threads are waiting for. But this 6th thread is also waiting for a lock (<0x000000005201b552> in the example below) held by the 2nd A4J rule executor thread (automation-rule-executor:thread-2)
"automation-rule-executor:thread-6" daemon prio=5 tid=0x00000000000d3d20 nid=0 waiting for monitor entry java.lang.Thread.State: BLOCKED (on object monitor) at java.base@11.0.13/java.util.Vector.listIterator(Unknown Source) - waiting to lock <0x000000005201b552> (a java.util.Vector) owned by automation-rule-executor:thread-2 id=0x00000000000d3cfa at java.base@11.0.13/java.util.AbstractList.equals(Unknown Source) at java.base@11.0.13/java.util.Vector.equals(Unknown Source) - locked <0x00000000154fe597> (a java.util.Vector) at org.apache.fop.util.CompareUtil.equal(CompareUtil.java:38) at org.apache.fop.fo.properties.ListProperty.equals(ListProperty.java:123) "automation-rule-executor:thread-2" daemon prio=5 tid=0x00000000000d3cfa nid=0 waiting for monitor entry java.lang.Thread.State: BLOCKED (on object monitor) at java.base@11.0.13/java.util.Vector.listIterator(Unknown Source) - waiting to lock <0x00000000154fe597> (a java.util.Vector) owned by automation-rule-executor:thread-6 id=0x00000000000d3d20 at java.base@11.0.13/java.util.AbstractList.equals(Unknown Source) at java.base@11.0.13/java.util.Vector.equals(Unknown Source) - locked <0x000000005201b552> (a java.util.Vector) at org.apache.fop.util.CompareUtil.equal(CompareUtil.java:38)
If we check the stack trace from any thread that's stuck, we can see that they are all very similar and that they all end up making a call to a class which belongs to a 3rd party add-on. Please note that the name of the 3rd party add-on class was replaced by the generic name com.some3rdpartyaddon.jira.plugin, since various 3rd party add-ons could lead to this situation (and not just a specific one):
- All the threads automation-rule-executor:thread-X which are responsible to process events from the queue and trigger the rules are stuck in the Blocked status
Cause
The exact conditions that trigger this issue are unknown. All we can tell is that:
- the rule executor threads from A4J are making calls to classes coming from the 3rd party add-on (to be identified based on the class found in the thread dumps)
- this add-on somehow makes the rule executor threads wait for each other's lock (resource)
- ultimately all the rule executor threads end up in a deadlock situation, causing all the automation rule to stop being triggered
The reason why A4J rules configured with the option Execute this rule immediately when the rule is triggered ticked are not impacted is because such rule does not rely on the A4J queue and the A4J rule executor threads to be triggered and executed. Such rules bypass the queue and are executed immediately.
Scenario 2
Diagnosis
When the following SQL query is executed to check the size of the A4J queue, we can see that the events keep piling up in the queue without being processed:
select count (*) from "AO_589059_AUTOMATION_QUEUE"; count ------- 47423 (1 row)
- When collecting thread dumps while the issue is happening, we can make the following observations:
- Most A4J execution threads are stuck in the TIMED_WAITING state:
If we look at the stack trace of each A4J executor thread, we can see that they are both stuck trying to perform some database operation, which was caused by a class coming from a 3rd party add-on. Please note that the name of the 3rd party add-on class was replaced by the generic name com.some3rdpartyaddon.jira.plugin, since various 3rd party add-ons could lead to this situation (and not just a specific one):
- Most A4J execution threads are stuck in the TIMED_WAITING state:
Cause
The exact conditions that trigger this issue are unknown. All we can tell is that:
- the rule executor threads from A4J are making calls to classes coming from the 3rd party add-on (to be identified based on the class found in the thread dumps)
- this add-on performs some database operation that never ends, putting the A4J executor threads into an indefinite TIMED_WAITING state
- ultimately all or most of the rule executor threads end in that same state, causing all the automation rule to stop being triggered
The reason why A4J rules configured with the option Execute this rule immediately when the rule is triggered ticked are not impacted is because such rule does not rely on the A4J queue and the A4J rule executor threads to be triggered and executed. Such rules bypass the queue and are executed immediately.
Solution for all scenarios
The solution is to disable the problematic 3rd party add-on identified during the diagnosis steps and re-start the Jira application by following the steps below:
- Schedule some maintenance window
- Go to ⚙ > Manage Apps > Manage Apps
- Look for the add-on and disable it
- Re-start the Jira application
If being able to use the add-on is critical for your operations, then we recommend to reach out to the add-on support team for further assistance, since 3rd party add-son are not supported by Atlassian.