Jira Batched Notifications stop being sent from any project after adding a big number of watchers to a ticket

Still need help?

The Atlassian Community is here for you.

Ask the community


Platform Notice: Server and Data Center Only. This article only applies to Atlassian products on the server and data center platforms.

    

Summary

After accidentally adding a very high number of users to the watcher list of a ticket (for example, hundreds of thousands), the Jira Batched Notifications stop being sent to users (or they are sent with hours of delay).

(warning) Note that if you are using Jira Service Management and that the problem impacts Customer Notifications, then this KB article does not apply. This KB article is only about Jira batched notifications.

Environment

Any Jira version from 8.0.0.

Diagnosis

  • Any type of Jira notification (issue created, issue updated...) for any user and from any Jira ticket is impacted
  • Jira Notifications are sent successfully only when batching is disabled in ⚙ > System > Batching email notifications
  • The problem started to occur after a huge number of watchers (for example, hundreds of thousands) were added to at least 1 Jira ticket
  • Re-starting the Jira application does not help resolve this issue
  • Running the following SQL query returns a high number of results (for example, a few millions), showing that there are a high number of batched notification events that still need to be processed

    select count(*) from "AO_733371_EVENT_RECIPIENT" WHERE "STATUS" = 'NEW' AND "CONSUMER_NAME"='mailEventConsumer';
  • When collecting thread dumps while the issue is happening, we can see a long running thread which is busy processing batched notification event:
    • Long running thread (from the 1st dump)

      "Caesium-1-1" daemon prio=5 tid=0x0000000000000ba9 nid=0 runnable 
         java.lang.Thread.State: RUNNABLE
      	at java.util.Arrays.hashCode(Arrays.java:4146)
      	at java.util.Objects.hash(Objects.java:128)
      	at com.atlassian.jira.plugins.inform.api.events.dto.RecipientDTO.hashCode(RecipientDTO.java:103)
      	at java.util.AbstractList.hashCode(AbstractList.java:541)
      	at java.util.Arrays.hashCode(Arrays.java:4146)
      	at java.util.Objects.hash(Objects.java:128)
      	at com.atlassian.jira.plugins.inform.api.events.dto.EventDTO.hashCode(EventDTO.java:127)
      	at java.util.HashMap.hash(HashMap.java:339)
      	at java.util.HashMap$HashIterator.remove(HashMap.java:1462)
      	at java.util.AbstractSet.removeAll(AbstractSet.java:178)
      	at com.atlassian.jira.plugins.inform.batching.cron.BatchNotificationJob.lambda$removeProcessedEvents$6(BatchNotificationJob.java:229)
      	at com.atlassian.jira.plugins.inform.batching.cron.BatchNotificationJob$$Lambda$5739/870912907.accept(Unknown Source)
      	at java.util.ArrayList.forEach(ArrayList.java:1257)
      	at com.atlassian.jira.plugins.inform.batching.cron.BatchNotificationJob.removeProcessedEvents(BatchNotificationJob.java:228)
      	at com.atlassian.jira.plugins.inform.batching.cron.BatchNotificationJob.processEventBatch(BatchNotificationJob.java:149)
      	at com.atlassian.jira.plugins.inform.batching.cron.BatchNotificationJob.lambda$notifyUsers$1(BatchNotificationJob.java:114)
      	at com.atlassian.jira.plugins.inform.batching.cron.BatchNotificationJob$$Lambda$5688/903887977.apply(Unknown Source)
      	at com.atlassian.jira.plugins.inform.performance.MeasurementWorkerFactory$1.measure(MeasurementWorkerFactory.java:39)
      	at com.atlassian.jira.plugins.inform.batching.cron.BatchNotificationJob.notifyUsers(BatchNotificationJob.java:109)
      	at com.atlassian.jira.plugins.inform.batching.cron.BatchNotificationJob.lambda$runJob$0(BatchNotificationJob.java:86)
      	at com.atlassian.jira.plugins.inform.batching.cron.BatchNotificationJob$$Lambda$5684/1174080777.apply(Unknown Source)
      	at com.atlassian.jira.plugins.inform.performance.MeasurementWorkerFactory$1.measure(MeasurementWorkerFactory.java:39)
      	at com.atlassian.jira.plugins.inform.batching.cron.BatchNotificationJob.runJob(BatchNotificationJob.java:84)
      	at com.atlassian.jira.plugins.inform.batching.cron.ConditionalJobRunner.runJob(ConditionalJobRunner.java:33)
      	at com.atlassian.jira.plugins.inform.batching.cron.ConditionalJobRunner.runJob(ConditionalJobRunner.java:33)
      	at com.atlassian.jira.plugins.inform.batching.cron.OncePerClusterJobRunner.runJob(OncePerClusterJobRunner.java:46)
      	at com.atlassian.scheduler.core.JobLauncher.runJob(JobLauncher.java:134)
      	at com.atlassian.scheduler.core.JobLauncher.launchAndBuildResponse(JobLauncher.java:106)
      	at com.atlassian.scheduler.core.JobLauncher.launch(JobLauncher.java:90)
      	at com.atlassian.scheduler.caesium.impl.CaesiumSchedulerService.launchJob(CaesiumSchedulerService.java:435)
      	at com.atlassian.scheduler.caesium.impl.CaesiumSchedulerService.executeClusteredJob(CaesiumSchedulerService.java:430)
      	at com.atlassian.scheduler.caesium.impl.CaesiumSchedulerService.executeClusteredJobWithRecoveryGuard(CaesiumSchedulerService.java:454)
      	at com.atlassian.scheduler.caesium.impl.CaesiumSchedulerService.executeQueuedJob(CaesiumSchedulerService.java:382)
      	at com.atlassian.scheduler.caesium.impl.CaesiumSchedulerService$$Lambda$2370/1703800892.accept(Unknown Source)
      	at com.atlassian.scheduler.caesium.impl.SchedulerQueueWorker.executeJob(SchedulerQueueWorker.java:66)
      	at com.atlassian.scheduler.caesium.impl.SchedulerQueueWorker.executeNextJob(SchedulerQueueWorker.java:60)
      	at com.atlassian.scheduler.caesium.impl.SchedulerQueueWorker.run(SchedulerQueueWorker.java:35)
      	at java.lang.Thread.run(Thread.java:748)
      
         Locked ownable synchronizers:
    • Same long running thread (from the 2nd dump)

      "Caesium-1-1" daemon prio=5 tid=0x0000000000000ba9 nid=0 runnable 
         java.lang.Thread.State: RUNNABLE
      	at java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:174)
      	at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382)
      	at java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:580)
      	at java.util.stream.ReferencePipeline$7$1.accept(ReferencePipeline.java:270)
      	at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
      	at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382)
      	at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482)
      	at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472)
      	at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
      	at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
      	at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)
      	at com.atlassian.jira.plugins.inform.batching.cron.BatchNotificationJob.getRecipientIds(BatchNotificationJob.java:211)
      	at com.atlassian.jira.plugins.inform.batching.cron.BatchNotificationJob.processBatches(BatchNotificationJob.java:165)
      	at com.atlassian.jira.plugins.inform.batching.cron.BatchNotificationJob.processEventBatch(BatchNotificationJob.java:150)
      	at com.atlassian.jira.plugins.inform.batching.cron.BatchNotificationJob.lambda$notifyUsers$1(BatchNotificationJob.java:114)
      
      ...
      
      	at com.atlassian.scheduler.caesium.impl.SchedulerQueueWorker.executeJob(SchedulerQueueWorker.java:66)
      	at com.atlassian.scheduler.caesium.impl.SchedulerQueueWorker.executeNextJob(SchedulerQueueWorker.java:60)
      	at com.atlassian.scheduler.caesium.impl.SchedulerQueueWorker.run(SchedulerQueueWorker.java:35)
      	at java.lang.Thread.run(Thread.java:748)
      
         Locked ownable synchronizers:
      	- None
    • Same long running thread (from the 3rd dump)

      "Caesium-1-1" daemon prio=5 tid=0x0000000000000ba9 nid=0 runnable 
         java.lang.Thread.State: RUNNABLE
      	at java.util.Arrays.hashCode(Arrays.java:4146)
      	at java.util.Objects.hash(Objects.java:128)
      	at com.atlassian.jira.plugins.inform.api.events.dto.RecipientDTO.hashCode(RecipientDTO.java:103)
      	at java.util.AbstractList.hashCode(AbstractList.java:541)
      	at java.util.Arrays.hashCode(Arrays.java:4146)
      	at java.util.Objects.hash(Objects.java:128)
      	at com.atlassian.jira.plugins.inform.api.events.dto.EventDTO.hashCode(EventDTO.java:127)
      	at java.util.HashMap.hash(HashMap.java:339)
      	at java.util.HashMap.remove(HashMap.java:799)
      	at java.util.HashSet.remove(HashSet.java:236)
      	at java.util.AbstractSet.removeAll(AbstractSet.java:174)
      	at com.atlassian.jira.plugins.inform.batching.cron.BatchNotificationJob.lambda$removeProcessedEvents$6(BatchNotificationJob.java:229)
      	at com.atlassian.jira.plugins.inform.batching.cron.BatchNotificationJob$$Lambda$5739/870912907.accept(Unknown Source)
      	at java.util.ArrayList.forEach(ArrayList.java:1257)
      	at com.atlassian.jira.plugins.inform.batching.cron.BatchNotificationJob.removeProcessedEvents(BatchNotificationJob.java:228)
      	at com.atlassian.jira.plugins.inform.batching.cron.BatchNotificationJob.processEventBatch(BatchNotificationJob.java:149)
      	at com.atlassian.jira.plugins.inform.batching.cron.BatchNotificationJob.lambda$notifyUsers$1(BatchNotificationJob.java:114)
      
      ...
      
      	at com.atlassian.scheduler.caesium.impl.SchedulerQueueWorker.executeJob(SchedulerQueueWorker.java:66)
      	at com.atlassian.scheduler.caesium.impl.SchedulerQueueWorker.executeNextJob(SchedulerQueueWorker.java:60)
      	at com.atlassian.scheduler.caesium.impl.SchedulerQueueWorker.run(SchedulerQueueWorker.java:35)
      	at java.lang.Thread.run(Thread.java:748)
      
         Locked ownable synchronizers:
      	- None

Cause

When using Jira Batched Notifications, whenever an action occurs in a Jira ticket, new events are added to the table AO_733371_EVENT_RECIPIENT with the status "NEW". If multiple users are supposed to receive a notification from that event, then for each event, there will be as many rows added to the table AO_733371_EVENT_RECIPIENT as recipients.

Let's assume that 100k users were added by accident to the watcher list of a Jira ticket. In this case, if 10 actions happen in the ticket and the 100k users needs to be notified about them, then 100k * 10 = 1 million events (rows) will be added to the table AO_733371_EVENT_RECIPIENT with the status "NEW". As a result, the scheduled job which is responsible to process all the events stored in the table AO_733371_EVENT_RECIPIENT might take a very long time to process all these events, and could potentially get stuck.

Solution

One way to fix this issue is to force all the batched notifications events from the problematic ticket to be marked as "PROCESSED" in the AO_733371_EVENT_RECIPIENT table. This way, the batched notification job will stop trying to process them and will move on to the other events from other Jira tickets. The resolution steps are listed below:

  1. Identify the problematic ticket(s). One way to do it is to:
    1. Go to the issue search page
    2. Add "watchers" to the list of columns
    3. Search for issues across the whole Jira instance (no need to add any text to the search, since we are looking for all the Jira issues)
    4. Sort the Jira issues found by the search by "watchers" (desc order), so that you can identify the Jira issue(s) that contain huge numbers of watchers (hundreds of thousands)
    5. Take note of the issue key(s)
  2. Stop the Jira application
  3. (warning) Use your Database native tool to backup the Database. Make sure to not skip that step, so that you can revert back to this backup if needed
  4. Run the following SQL query and make sure that it returns a high number of rows (expect a few million results). Make sure to replace 'ABC-123', 'ABC-456,' 'ABC-789' in the SQL query with the actual list of issue key(s) identified in the Step 1.

    select count(*) from "AO_733371_EVENT_RECIPIENT" where "EVENT_ID" in (select "EVENT_ID" from "AO_733371_EVENT_PARAMETER"
    where "NAME" = 'object#issue#key#0' AND "VALUE" in ('ABC-123', 'ABC-456', 'ABC-789')) AND "STATUS" = 'NEW' AND "CONSUMER_NAME"='mailEventConsumer';
  5. Once you confirmed that the query above returned a high number of rows, run the UPDATE query below which will force all the events from the problematic ticket(s) to be marked as processed, so that they can be skipped in the future:

    update "AO_733371_EVENT_RECIPIENT" set "STATUS" = 'PROCESSED'
    where "EVENT_ID" in (select "EVENT_ID" from "AO_733371_EVENT_PARAMETER"
    where "NAME" = 'object#issue#key#0' AND "VALUE" in ('ABC-123', 'ABC-456', 'ABC-789')) AND "STATUS" = 'NEW' AND "CONSUMER_NAME"='mailEventConsumer';
  6. Start the Jira application
  7. Try to trigger some notifications and verify that Jira Batched Notifications are sent.
    (warning) Note that you might have to wait for some time to confirm that notifications are sent, depending on what frequency was set for the batched notifications (10min by default)


Last modified on Mar 11, 2022

Was this helpful?

Yes
No
Provide feedback about this article
Powered by Confluence and Scroll Viewport.