Troubleshooting slow SLA and Automation processing in Jira Service Management Data Center

Still need help?

The Atlassian Community is here for you.

Ask the community

Platform notice: Server and Data Center only. This article only applies to Atlassian products on the Server and Data Center platforms.

Support for Server* products ended on February 15th 2024. If you are running a Server product, you can visit the Atlassian Server end of support announcement to review your migration options.

*Except Fisheye and Crucible

Summary

In Jira Service Management, Automation and SLA processing might become slow due to the characteristics of the Jira subsystem used to process such events named PSMQ.

You may notice one or more of the following items are true:

  1. Automation processing or SLA calculation is slow. For example, it takes a long time for customers to receive updates via email, or, when a ticket is transitioned, it takes a long time for the SLA's on the right hand side of Jira
  2. High database activity against database tables AO_319474_QUEUE  and AO_319474_MESSAGE 

  3. When you take Thread Dump, you notice groups of threads starting with the the following names that are mostly RUNNABLE over time  - SdOffThreadEventJobRunner , SdSerialisedOffThreadProcessor , PsmqAsyncExecutors-job

Environment

  • Jira Service Management (Server or Data Center)

Cause

PSMQ is a database backed message system. Thus there are a number of variables that go into it's performance:

  • The amount of events (such as issue edit, comment added, etc) occurring in Jira. This is primarily based on user load
  • The performance capacity of the database to store messages for these events and provide them back to Jira
  • The performance capacity of Jira to consume events

This knowledge base article explores how to determine why PSMQ is not performing to expectation and some potential avenues to resolve it.

SLA calculation performance also depends on the JQL complexity of SLA's, and, the number of SLAs present in the system. Before proceeding, consider whether your environment has exceeded any of the documented Jira Service Management guardrails

Diagnosis

The following indicators can be used to determine if there is contention with the PSMQ system:

  • On the database, check the size trend of AO_319474_MESSAGE  table. The rows in this table correspond to Jira events that need processing, and are removed once processing is completed. It should normally trend towards 0. Check if the row count is very high and is trending upwards.

    SELECT count(*) FROM "AO_319474_MESSAGE";
  • On the database, check the count of pending messages within each "queue":

    SELECT count("ID") queue_count, "QUEUE_ID"
    FROM "AO_319474_MESSAGE"
    GROUP BY "QUEUE_ID"
    ORDER BY queue_count DESC;

    If any queues are listed with high queue counts, and those counts don't trend downward, there may be a specific user activity on that issue that is causing excess load. To determine which Jira issue is for which queue, use:

    SELECT q."ID", q."NAME", m."ID"
    FROM "AO_319474_QUEUE" q
    JOIN "AO_319474_MESSAGE" m on q."ID" = m."QUEUE_ID"
    WHERE q."ID" = <ID FROM FIRST QUERY>;
  • If your version is lower than 5.2.0, there are known performance and stability bugs that could be in effect, including:

    Click here to expand...

    Deadlocking in Jira Service Management when frequently updating the same issue

    JSDSERVER-5736 - Getting issue details... STATUS

    JSDSERVER-5732 - Getting issue details... STATUS

    JSDSERVER-5730 - Getting issue details... STATUS

    JSDSERVER-6717 - Getting issue details... STATUS

    JSDSERVER-6715 - Getting issue details... STATUS

    JSDSERVER-8502 - Getting issue details... STATUS

    JSDSERVER-8504 - Getting issue details... STATUS

    JSDSERVER-8635 - Getting issue details... STATUS

Solution

The following avenues can be pursued:

  • Reduce SLA count and complexity, if limits in Jira Service Management guardrails are exceeded
  • Upgrade to JSM version 5.2.0 or later
    • Significant performance increases were shipped in 4.21.0, and again in 5.2.0
  • If the database is not contented, you may increase the SLA processing thread pool size. For JSM 5.1.0 and above, see "Configure thread processing" in this article. For older versions, adjust the properties sd.event.processing.async.thread.pool.count  and sd.event.processing.serialised.thread.pool.count  using the solution in this article
  • Identify the issues with the highest queue counts using the queries above and reduce the load on them if the activity continues, for example, in the case of an aggressive REST API integration
  • Increase the performance of the database
  • Monitor the queue counts using the queries above and wait. The load may reduce by itself

For assistance you may reach Atlassian Support at https://support.atlassian.com/contact





Last modified on Dec 20, 2024

Was this helpful?

Yes
No
Provide feedback about this article
Powered by Confluence and Scroll Viewport.