Troubleshooting slow SLA and Automation processing in Jira Service Management Data Center
Platform notice: Server and Data Center only. This article only applies to Atlassian products on the Server and Data Center platforms.
Support for Server* products ended on February 15th 2024. If you are running a Server product, you can visit the Atlassian Server end of support announcement to review your migration options.
*Except Fisheye and Crucible
Summary
In Jira Service Management, Automation and SLA processing might become slow due to the characteristics of the Jira subsystem used to process such events named PSMQ.
- Automation processing includes anything configured in the JSM Project Settings → Automation (now called Legacy Automation) section of Jira. It does not include Automation for Jira based rules
- SLA calculation refers to displaying the "SLAs" section on the right hand side of a ticket
You may notice one or more of the following items are true:
- Automation processing or SLA calculation is slow. For example, it takes a long time for customers to receive updates via email, or, when a ticket is transitioned, it takes a long time for the SLA's on the right hand side of Jira
High database activity against database tables
AO_319474_QUEUE
andAO_319474_MESSAGE
- When you take Thread Dump, you notice groups of threads starting with the the following names that are mostly
RUNNABLE
over time -SdOffThreadEventJobRunner
,SdSerialisedOffThreadProcessor
,PsmqAsyncExecutors-job
Environment
- Jira Service Management (Server or Data Center)
Cause
PSMQ is a database backed message system. Thus there are a number of variables that go into it's performance:
- The amount of events (such as issue edit, comment added, etc) occurring in Jira. This is primarily based on user load
- The performance capacity of the database to store messages for these events and provide them back to Jira
- The performance capacity of Jira to consume events
This knowledge base article explores how to determine why PSMQ is not performing to expectation and some potential avenues to resolve it.
SLA calculation performance also depends on the JQL complexity of SLA's, and, the number of SLAs present in the system. Before proceeding, consider whether your environment has exceeded any of the documented Jira Service Management guardrails
Diagnosis
The following indicators can be used to determine if there is contention with the PSMQ system:
On the database, check the size trend of
AO_319474_MESSAGE
table. The rows in this table correspond to Jira events that need processing, and are removed once processing is completed. It should normally trend towards 0. Check if the row count is very high and is trending upwards.SELECT count(*) FROM "AO_319474_MESSAGE";
On the database, check the count of pending messages within each "queue":
SELECT count("ID") queue_count, "QUEUE_ID" FROM "AO_319474_MESSAGE" GROUP BY "QUEUE_ID" ORDER BY queue_count DESC;
If any queues are listed with high queue counts, and those counts don't trend downward, there may be a specific user activity on that issue that is causing excess load. To determine which Jira issue is for which queue, use:
SELECT q."ID", q."NAME", m."ID" FROM "AO_319474_QUEUE" q JOIN "AO_319474_MESSAGE" m on q."ID" = m."QUEUE_ID" WHERE q."ID" = <ID FROM FIRST QUERY>;
If your version is lower than 5.2.0, there are known performance and stability bugs that could be in effect, including:
Solution
The following avenues can be pursued:
- Reduce SLA count and complexity, if limits in Jira Service Management guardrails are exceeded
- Upgrade to JSM version 5.2.0 or later
- If the database is not contented, you may increase the SLA processing thread pool size. For JSM 5.1.0 and above, see "Configure thread processing" in this article. For older versions, adjust the properties
sd.event.processing.async.thread.pool.count
andsd.event.processing.serialised.thread.pool.count
using the solution in this article - Identify the issues with the highest queue counts using the queries above and reduce the load on them if the activity continues, for example, in the case of an aggressive REST API integration
- Increase the performance of the database
- Monitor the queue counts using the queries above and wait. The load may reduce by itself
For assistance you may reach Atlassian Support at https://support.atlassian.com/contact