Jira Batched Notifications stop being sent from any project after a big size comment was edited in a ticket

Still need help?

The Atlassian Community is here for you.

Ask the community


Platform Notice: Server and Data Center Only - This article only applies to Atlassian products on the server and data center platforms.

   

Summary

After editing a comment of a very big size (for example, of 300k characters), the Jira Batched Notifications stop being sent to any user from any Jira ticket (or they are sent with hours of delay).

(warning) Note that if you are using Jira Service Management and that the problem impacts Customer Notifications, then this KB article does not apply. This KB article is only about Jira Batched Notifications.

Environment

Any Jira Server/Data Center version from 8.0.0.

Diagnosis

  • Any type of Jira notification (issue created, issue updated...) for any user and from any Jira ticket is impacted
  • Jira Notifications are sent successfully only when batching is disabled in ⚙ > System > Batching email notifications
  • The problem started to occur after a comment of very big size was edited in a ticket
  • Re-starting the Jira application does not help resolve this issue
  • When Running the following SQL query, we will see 2 rows, one with the "S" status and one with the "A" status, showing that the Batched Notification job has been stuck for many days:

    select * from rundetails where job_id in ('com.atlassian.jira.plugins.inform.batching.cron.BatchNotificationJobSchedulerImpl');
    • Example of result:

      "id","job_id","start_time","run_duration","run_outcome","info_message"
      55830761.0,"com.atlassian.jira.plugins.inform.batching.cron.BatchNotificationJobSchedulerImpl","2022-01-13 17:13:56.923",2.0,"A","Already running"
      55142066.0,"com.atlassian.jira.plugins.inform.batching.cron.BatchNotificationJobSchedulerImpl","2021-12-23 16:52:32.947",70685558.0,"S",""
  • Running the following SQL query returns a high number of results (for example, hundreds of thousands), showing that there are a high number of batched notification events that still need to be processed

    select count(*) from "AO_733371_EVENT_RECIPIENT" WHERE "STATUS" = 'NEW' AND "CONSUMER_NAME"='mailEventConsumer';
  • When collecting thread dumps while the issue is happening, we can see the following long running thread which is busy processing batched notification event, and trying to measure the difference between the old and new comment, using an Apache library:

    "Caesium-1-1" daemon prio=5 tid=0x00000000000004eb nid=0 runnable 
       java.lang.Thread.State: RUNNABLE
    	at org.apache.commons.jrcs.diff.myers.MyersDiff.buildPath(MyersDiff.java:153)
    	at org.apache.commons.jrcs.diff.myers.MyersDiff.diff(MyersDiff.java:93)
    	at org.apache.commons.jrcs.diff.Diff.diff(Diff.java:197)
    	at com.atlassian.diff.WordLevelDiffer.diffWords(WordLevelDiffer.java:101)
    	at com.atlassian.diff.WordLevelDiffer.diffLine(WordLevelDiffer.java:91)
    	at com.atlassian.diff.DiffViewBean.createWordLevelDiff(DiffViewBean.java:108)
    	at com.atlassian.jira.mail.DiffUtils.diff(DiffUtils.java:19)
    	at com.atlassian.jira.plugins.inform.batching.rendering.utils.DiffRenderer.diffAsHtml(DiffRenderer.java:22)
    	at com.atlassian.jira.plugins.inform.batching.rendering.context.CommentItemFactory.getCommentItem(CommentItemFactory.java:184)
    	at com.atlassian.jira.plugins.inform.batching.rendering.context.CommentItemFactory.lambda$create$1(CommentItemFactory.java:112)
    	at com.atlassian.jira.plugins.inform.batching.rendering.context.CommentItemFactory$$Lambda$8119/189761663.apply(Unknown Source)
    	at java.util.Optional.map(Optional.java:215)
    	at com.atlassian.jira.plugins.inform.batching.rendering.context.CommentItemFactory.create(CommentItemFactory.java:112)
    	at com.atlassian.jira.plugins.inform.batching.rendering.context.CommentSectionContextProvider.lambda$createContext$1(CommentSectionContextProvider.java:53)
    	at com.atlassian.jira.plugins.inform.batching.rendering.context.CommentSectionContextProvider$$Lambda$8116/1480611580.apply(Unknown Source)
    	at java.util.stream.ReferencePipeline$7$1.accept(ReferencePipeline.java:269)
    	at java.util.stream.Streams$StreamBuilderImpl.forEachRemaining(Streams.java:419)
    	at java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:647)
    	at java.util.stream.ReferencePipeline$7$1.accept(ReferencePipeline.java:272)
    	at java.util.stream.SortedOps$SizedRefSortingSink.end(SortedOps.java:357)
    	at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:483)
    	at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472)
    	at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
    	at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
    	at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:566)
    	at com.atlassian.jira.plugins.inform.batching.rendering.context.CommentSectionContextProvider.createContext(CommentSectionContextProvider.java:54)
    	at com.atlassian.jira.plugins.inform.batching.BatcherServiceImpl.lambda$getContext$3(BatcherServiceImpl.java:218)
    	at com.atlassian.jira.plugins.inform.batching.BatcherServiceImpl$$Lambda$8054/2114035069.apply(Unknown Source)
    	at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
    	at java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175)
    	at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1384)
    	at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482)
    	at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472)
    	at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
    	at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
    	at java.util.stream.ReferencePipeline.reduce(ReferencePipeline.java:541)
    	at com.atlassian.jira.plugins.inform.batching.BatcherServiceImpl.getContext(BatcherServiceImpl.java:230)
    	at com.atlassian.jira.plugins.inform.batching.BatcherServiceImpl.getContext(BatcherServiceImpl.java:218)
    	at com.atlassian.jira.plugins.inform.batching.BatcherServiceImpl.createEmail(BatcherServiceImpl.java:142)
    	at com.atlassian.jira.plugins.inform.batching.cron.BatchNotificationJob.sendEmail(BatchNotificationJob.java:159)
    	at com.atlassian.jira.plugins.inform.batching.cron.BatchNotificationJob.processBatches(BatchNotificationJob.java:144)
    	at com.atlassian.jira.plugins.inform.batching.cron.BatchNotificationJob.processEventBatch(BatchNotificationJob.java:127)
    	at com.atlassian.jira.plugins.inform.batching.cron.BatchNotificationJob.lambda$notifyUsers$0(BatchNotificationJob.java:100)
    	at com.atlassian.jira.plugins.inform.batching.cron.BatchNotificationJob$$Lambda$7684/1141650474.apply(Unknown Source)
    	at com.atlassian.jira.plugins.inform.performance.MeasurementWorkerFactory$1.measure(MeasurementWorkerFactory.java:41)
    	at com.atlassian.jira.plugins.inform.batching.cron.BatchNotificationJob.notifyUsers(BatchNotificationJob.java:97)
    	at com.atlassian.jira.plugins.inform.batching.cron.BatchNotificationJob.runJob(BatchNotificationJob.java:82)
    	at com.atlassian.jira.plugins.inform.batching.cron.ConditionalJobRunner.runJob(ConditionalJobRunner.java:33)
    	at com.atlassian.jira.plugins.inform.batching.cron.ConditionalJobRunner.runJob(ConditionalJobRunner.java:33)
    	at com.atlassian.jira.plugins.inform.batching.cron.OncePerClusterJobRunner.runJob(OncePerClusterJobRunner.java:46)
    	at com.atlassian.scheduler.core.JobLauncher.runJob(JobLauncher.java:134)
    	at com.atlassian.scheduler.core.JobLauncher.launchAndBuildResponse(JobLauncher.java:106)
    	at com.atlassian.scheduler.core.JobLauncher.launch(JobLauncher.java:90)
    	at com.atlassian.scheduler.caesium.impl.CaesiumSchedulerService.launchJob(CaesiumSchedulerService.java:435)
    	at com.atlassian.scheduler.caesium.impl.CaesiumSchedulerService.executeClusteredJob(CaesiumSchedulerService.java:430)
    	at com.atlassian.scheduler.caesium.impl.CaesiumSchedulerService.executeClusteredJobWithRecoveryGuard(CaesiumSchedulerService.java:454)
    	at com.atlassian.scheduler.caesium.impl.CaesiumSchedulerService.executeQueuedJob(CaesiumSchedulerService.java:382)
    	at com.atlassian.scheduler.caesium.impl.CaesiumSchedulerService$$Lambda$3879/1958276731.accept(Unknown Source)
    	at com.atlassian.scheduler.caesium.impl.SchedulerQueueWorker.executeJob(SchedulerQueueWorker.java:66)
    	at com.atlassian.scheduler.caesium.impl.SchedulerQueueWorker.executeNextJob(SchedulerQueueWorker.java:60)
    	at com.atlassian.scheduler.caesium.impl.SchedulerQueueWorker.run(SchedulerQueueWorker.java:35)
    	at java.lang.Thread.run(Thread.java:748)
    
    
  • When checking the value for jira.text.field.character.limit in Jira > ⚙ > System > General Configuration Advanced Settings, this value might have been manually set to 0 (instead of the default value 32767), which means that there is no limit on any Jira issue text field nor Jira issue comment. This value can also be checked in the application.xml file from the support zip:

    <jira.text.field.character.limit>0</jira.text.field.character.limit>

Cause

The job responsible to generate the Jira batched notifications and add them to the mail queue is stuck, as it is trying to generate the difference between an old comment an a new comment, and this difference is too big to be processed in a timely manner. We are running into a variation of the bug below, which is also know to make the mail queue service stuck under some conditions:

JRASERVER-65963 - Getting issue details... STATUS

To build the difference between the old and new comment, the Batched Notification job uses a "diff" library that comes from Apache. Normally, if the default size of jira.text.field.character.limit is set to the default value 32767, the job should not get stuck. However, if this value is changed to 0, then it means that comments of any size can be added to Jira issues and can also be edited. In such situation, then it is possible that the diff library might get stuck while checking the difference between 2 comments of very different sizes. It is very important to note that the bigger the difference between the original text in the comment and what it was updated to, the longer the diff library will take to process what it was change. For example comparing 1000 characters vs 300000 characters will take longer than comparing 240000 characters vs 300000 characters as the difference is smaller in the second comparison.

Solution

Immediate action

First of all, if the field  jira.text.field.character.limit is not set to the default value 32767 in Jira > ⚙ > System > General Configuration Advanced Settings, then we highly recommend to change it back to 32767, in order to prevent users from editing Jira comments of very big size.

After you do that, we still need to bring back the Batched Notification Job to a stable state. There are various ways to do this (Solution 1 and Solution 2 below), each one with its own pros and cons. 

Solution 1

This solution consists in forcing all the events that occurred on the day when the notifications got stuck to be processed and to restart Jira. This way, the batched notification job will stop trying to process them and will move on to the events that occurred from the day after.

(warning) Note that the drawback of this solution is that all the notifications that should have been sent on that specific day will be lost.

  1. Stop the Jira application
  2. (warning) Use your Database native tool to backup the Database. Make sure to not skip that step, so that you can revert back to this backup if needed
  3. Identify the day when the batched notification job got stuck by running the following query:

    select * from rundetails where job_id in ('com.atlassian.jira.plugins.inform.batching.cron.BatchNotificationJobSchedulerImpl');


    1. You should get 2 rows in the output. 1 row with the status "A" (already running), and 1 row with the status "S". The row with the status "S" is the one that shows the day when the job got stuck. In the example below, we can see that the job got stuck on Dec 23rd 2021:

      "id","job_id","start_time","run_duration","run_outcome","info_message"
      55830761.0,"com.atlassian.jira.plugins.inform.batching.cron.BatchNotificationJobSchedulerImpl","2022-01-13 17:13:56.923",2.0,"A","Already running"
      55142066.0,"com.atlassian.jira.plugins.inform.batching.cron.BatchNotificationJobSchedulerImpl","2021-12-23 16:52:32.947",70685558.0,"S",""
  4. Once you identified the day when the job got stuck, run the following SQL query to change the status of all these events from that day to PROCESSED. 

    (warning) Make sure to change <DATE_JOB_GOT_STUCK> and <DATE_AFTER_JOB_GOT_STUCK> accordingly:

    update "AO_733371_EVENT_RECIPIENT" set "STATUS" = 'PROCESSED'
    WHERE "STATUS" in ('NEW','PROCESSING') AND "CONSUMER_NAME"='mailEventConsumer'
    AND "CREATED" >= '<DATE_JOB_GOT_STUCK>'
    AND "CREATED" < '<DATE_AFTER_JOB_GOT_STUCK>';
    1. For example, if the job was stuck on Dec 23 2021, then the query will look like the one shown in the example below:

      update "AO_733371_EVENT_RECIPIENT" set "STATUS" = 'PROCESSED'
      WHERE "STATUS" in ('NEW','PROCESSING') AND "CONSUMER_NAME"='mailEventConsumer'
      AND "CREATED" >= '2021-12-23 00:00:00.000'
      AND "CREATED" < '2021-12-24 00:00:00.000';
  5. Start the Jira application
  6. Try to trigger some notifications and verify that Jira Batched Notifications are sent. 
    (warning) Note that you might have to wait for some time to confirm that notifications are sent, depending on what frequency was set for the batched notifications (10min by default)


If this solution did not help, then try the solution 2 instead.

Solution 2

This solution consists in forcing all the batched notifications events to be processed and to restart Jira. This way, the batched notification job will stop trying to process them and will only process any new event that occurred after the Jira restart.

(warning) Note that the drawback of this solution is that all the notifications that should have been sent until now will be lost.

  1. Stop the Jira application
  2. (warning) Use your Database native tool to backup the Database. Make sure to not skip that step, so that you can revert back to this backup if needed
  3. Run the following SQL query to change the status of absolutely all the unprocessed events to "PROCESSED": 

    update "AO_733371_EVENT_RECIPIENT" set "STATUS" = 'PROCESSED'
    WHERE "STATUS" in ('NEW','PROCESSING') AND "CONSUMER_NAME"='mailEventConsumer';
  4. Start the Jira application
  5. Try to trigger some notifications and verify that Jira Batched Notifications are sent. 
    (warning) Note that you might have to wait for some time to confirm that notifications are sent, depending on what frequency was set for the batched notifications (10min by default)


If neither Solution 1 nor Solution 2 helped, or if you have any doubt on how to run the SQL queries, please reach out to Atlassian Support for further support via this link. 


Last modified on Mar 11, 2022

Was this helpful?

Yes
No
Provide feedback about this article
Powered by Confluence and Scroll Viewport.