Automation for Jira - Rules are not triggered for some Jira issues due to missing threads
Platform Notice: Data Center - This article applies to Atlassian products on the Data Center platform.
Note that this knowledge base article was created for the Data Center version of the product. Data Center knowledge base articles for non-Data Center-specific features may also work for Server versions of the product, however they have not been tested. Support for Server* products ended on February 15th 2024. If you are running a Server product, you can visit the Atlassian Server end of support announcement to review your migration options.
*Except Fisheye and Crucible
Summary
Automation rules are not triggered for some Jira issues (Jira issues are randomly skipped).
For example, let say that an automation rule is configured to be triggered whenever a new issue is created. When checking the audit logs of this automation rule, we can see that there is no entry in the audit logs for the issue that were skipped/missed by the automation:
Environment
- Jira Data Center 8.0.0 an any higher version, and on at least 2 nodes
- Automation for Jira (A4J): observed on 8.0.0, but the issue might occur on lower and higher versions too
Diagnosis
- Jira is running on a cluster of at least 2 nodes
- The issues that are skipped by the automation rules were all created/updated from the same "unhealthy" node
- When generating thread dumps on each node, we can see that:
- On the "healthy" nodes (nodes where issues are not skipped), the A4J event serializer threads are up and running (the thread name has the format automation-event-serializer:thread-X):
- On the "non healthy" nodes (nodes where issues are skipped), the A4J event serializer threads are all missing (the thread name has the format automation-event-serializer:thread-X):
- On the "healthy" nodes (nodes where issues are not skipped), the A4J event serializer threads are up and running (the thread name has the format automation-event-serializer:thread-X):
- The A4J application is correctly enabled on each node, which can be verified following either method below:
- By going to the ⚙ > Manage Apps > Manage Apps page after logging directly into each Jira node
By generating a support zip from each Jira node, and making sure that A4J is showing as enabled in the application.xml file of each node:
<plugin> <key>com.codebarrel.addons.automation</key> <name>Automation for Jira</name> <version>8.0.0</version> <vendor>Atlassian</vendor> <status>ENABLED</status> <vendor-url>https://atlassian.com/</vendor-url> <framework-version>2</framework-version> <bundled>User installed</bundled> </plugin>
- Note that, if you observed that A4J is disabled on at least 1 node, then this KB article is not relevant. Instead, please take a look at the other KB article Automation for Jira - Rules are not triggered for some Jira issues due the application being disabled
- When checking the Jira application logs right after a Jira restart, the following observations can be made:
On the "healthy" nodes, we should see that there is a call made to the method getService bundle for the bundle com.atlassian.jira.plugin.automation.for-Jira (which is from A4J), and shortly after that, there is an initialization of the A4J threads:
2022-09-19 10:51:01,027-0700 localhost-startStop-1 DEBUG [c.a.activeobjects.osgi.ActiveObjectsServiceFactory] getService bundle [com.atlassian.jira.plugin.automation.for-jira] ... 2022-09-19 10:51:01,369-0700 localhost-startStop-1 WARN [c.c.j.p.automation.queue.JiraAutomationQueueExecutor] Initialising automation-rule-executor pool with 6 threads... 2022-09-19 10:51:01,371-0700 localhost-startStop-1 WARN [c.c.j.p.automation.queue.JiraAutomationQueueExecutor] automation-rule-executor pool running with 6 threads. 2022-09-19 10:51:01,371-0700 localhost-startStop-1 WARN [c.c.j.p.automation.queue.JiraAutomationQueueExecutor] Initialising automation-queue-claimer... 2022-09-19 10:51:01,374-0700 automation-queue-claimer:thread-1 WARN [c.c.j.p.automation.queue.JiraAutomationQueueExecutor] automation-queue-claimer is running.
- On the "non healthy" node, there is no trace of the call that should have been made to the method getService bundle for the bundle com.atlassian.jira.plugin.automation.for-Jira (which is from A4J). Also, no initialization of the A4J threads can be found
Cause
The reason why some Jira issues are skipped by the Automation Rules is because the A4J threads responsible for listening to the Jira events (threads with name starting with automation-event-serializer:thread-X) are not running at all on a specific node. As a result, whenever a Jira issue is created/updated on that particular node, the events that were fired by these issues are not detected by A4J and are lost instead, and rules don't get triggered.
The exact reason why the A4J threads are not running on the "non healthy" node is unclear: the suspicion is that under some unknown circumstances, A4J might fail to properly initialize these threads at startup. However, the exact conditions that lead to this situation are unclear, especially since this type of symptom is intermittent and does not happen at every Jira restart.
Solution
There are various ways to bring back A4J into a stable state on the unhealthy node, each method is equivalent since the key point is to force A4J to be re-initialized:
- Workaround 1 - Re-start the "unhealthy" node
- Workaround 2 - Upgrade A4J to a more recent version
- Workaround 3 - Un-install and re-install A4J to the same version
For any of the workaround listed above, we recommend to schedule a maintenance window, as they can cause downtime (for the 1st workaround), or have some temporary performance impact (workaround 2 and 3).