Deadlocking in Jira Service Management when frequently updating the same issue

Still need help?

The Atlassian Community is here for you.

Ask the community

If you have an automated process that keeps updating the same issue many times, it might lead to deadlocks after you upgrade to Jira Service Management 4.3 or later. Read on to identify whether your Jira Service Management instance is affected.

Context

In Jira Service Management 4.3, we’ve fixed two issues to improve the overall performance. One of the changes included bounding the thread pools, so limiting the number of concurrent threads:

Our tests have shown significant performance improvements across whole Jira Service Management. However, bounded thread pools can lead to problems in some cases.

Problem

We’ve noticed that a bounded thread pool can result in a deadlock in the following scenario:

  1. An instance has OffThreadExecution enabled

  2. An instance has an automated process that keeps updating the same issue (many times in one minute)

Related ticket:  JSDSERVER-6717 - Getting issue details... STATUS

Diagnosis

To check if your Jira Service Management is affected, you can run the following query periodically during peak times: 

PostgreSQL

select p.pkey, i.issuenum, issueid, count(*) count_updated
from (
         SELECT g.issueid, g.created as date
         FROM changegroup g -- all issue edits
         UNION ALL
         SELECT a.issueid, a.updated as date
         FROM jiraaction a -- all comments
     ) as all_events
         join jiraissue i on i.id = issueid
         join project p on p.id = i.project
WHERE date > now() - interval '1 minute'
group by 1, 2, 3
order by 4 desc;

Oracle

select p.pkey, i.issuenum, issueid, count(*) count_updated
from (
         SELECT g.issueid, g.created as ddate
         FROM changegroup g -- all issue edits
         UNION ALL
         SELECT a.issueid, a.updated as ddate
         FROM jiraaction a -- all comments
     ) all_events
         join jiraissue i on i.id = issueid
         join project p on p.id = i.project
WHERE ddate > CURRENT_DATE - interval '1' minute
group by p.pkey, i.issuenum, issueid
order by 4 desc;


If the query shows that your Jira Service Management is updating any issue many times per minute, your instance may be affected by this issue. Tests have shown that up to 60 updates per minute on a single issue shouldn’t be a problem.

A sudden spike in the number of updates for an issue, which exceeds the number of threads in the thread pool, might also result in a deadlock. Such a deadlock will be resolved eventually, but some issues might end up with a corrupted SLA.

Alternate Diagnosis

Another query that may indicate your Jira Service Management is affected by this issue is the following:

select * 
from "AO_319474_MESSAGE"
where "CLAIMANT" = NULL and "CLAIM_COUNT" > 0;


If the query shows there is a small number of events unclaimed but with a high claim count, it may indicate that your instance is affected by this issue.

Solution

Jira Service Management 4.9 and above

In Jira Service Management 4.9, we've improved the reliability of SLA processing. These changes are hidden behind a feature flag, so if this problem occurs, enable the feature flag sd.internal.base.db.backed.completion.events as per the steps in this KB article https://confluence.atlassian.com/jirakb/enable-dark-feature-in-jira-959286331.html.

Jira Service Management 4.3 - 4.8

If you are on a Jira Service Management version between 4.3 and 4.8, you can fix this issue by making changes in the database.

  1. Run the following query against your database to check if the sd.event.processing.async.thread.pool.count property exist. 

    select * from propertyentry where property_key='sd.event.processing.async.thread.pool.count'
  2. Complete one of these steps, depending on whether you have this property or not.


    1. If the property doesn’t exist, use this query.  Take into consideration that the default value is 5.

      //This gives the id to use in the next queries.
      select max(id) + 1 from propertyentry; 
      
      insert into propertyentry(id, entity_name, entity_id, property_key, propertytype) values (<id from previous query>, 'sd-internal-base-plugin', 1, 'sd.event.processing.async.thread.pool.count', 3);
      
      insert into propertynumber values (<id from the first query>, <new pool size value>);
    2. If the property exists, use this query. 

      update propertynumber set propertyvalue=<new pool size value> where id=<id present in the propertyentry table>;
  3. Updating sd.event.processing.async.thread.pool.count to a value not greater than the number of available threads on a node should improve throughput performance. Any larger value will very likely not result in further performance improvements.


Additional steps

Increasing OffThreadEventJobRunner to a large number can lead to one of the problems that we were trying to solve in the first place, so you’ll need to increase the number of available database connections as well.

JSDSERVER-5732 - Getting issue details... STATUS

To increase available database connections, see Tuning database connections.

Last modified on Jul 12, 2021

Was this helpful?

Yes
No
Provide feedback about this article
Powered by Confluence and Scroll Viewport.