Troubleshooting slow/stuck notification issues in Jira/Service Management Server/Data Center

Still need help?

The Atlassian Community is here for you.

Ask the community

Platform notice: Server and Data Center only. This article only applies to Atlassian products on the Server and Data Center platforms.

Support for Server* products ended on February 15th 2024. If you are running a Server product, you can visit the Atlassian Server end of support announcement to review your migration options.

*Except Fisheye and Crucible


On this page:

Problem

With the recent introduction of Jira batched notifications in Jira 8.x and the fact that Service Desk implements its own notification system, it has become more difficult to investigate why some notifications are sent with a long delay or simply never sent.

To efficiently troubleshoot this type of issue, it is important to first understand all the types of notifications that exist in both Jira and Service Desk, how they work, and all the services involved in the generation and sending of these notifications. This knowledge will help steer the investigation into the right direction and focus on the right possible root causes.

Understanding what types of notifications exist in Jira and Service Desk and how they work

When Jira and Service Desk are both running together, 3 types of notifications might come into play, depending on how Jira is configured and what projects are involved. These 3 types of notifications are listed below:

  • the Jira non-batched notifications
  • the Jira batched notifications
  • the Service Desk customer notifications

The Jira notifications:

  • are sent to:
    • any users working on Jira Core and Jira Software tickets
    • users acting as agents in Service Desk tickets
  • are configured through the Notification Schemes, via the page Project Settings > Notifications
  • can be:
    • either batched, if the setting ⚙ > System > Batching email notifications is enabled
    • or non-batched, if the setting ⚙ > System > Batching email notifications is disabled

The Service Desk customer notifications:

  • are sent to users acting as customers in Service Desk tickets
  • are configured through the Customer Notification configuration, via the page Project Settings > Customer Notifications
  • have their own batching mechanism which is different than the one used by the Jira batched notifications. They are batched with a period of 1 minute which is hardcoded and can't be configured

Regardless of the type of notification that is triggered, each notification ends up in the same Mail Queue:

  • this queue can be monitored in the page ⚙ > System > Mail Queue
  • this queue is automatically flushed by the Mail Queue Service every 1 minute by default
  • the Mail Queue Service sends the notification to the recipient via the SMTP mail server configured in ⚙ > System > Outgoing Mail

The diagram below shows how the 3 types of notifications work and:

  • which services/jobs are involved (in green) 
  • which database tables are involved (in blue)

Preliminary troubleshooting steps

As you can see in the diagram above, various factors including Jira services, database tables, and a mail server are involved in the entire process from the moment an event triggers a notification, and the notification email is actually sent.

There mainly 4 areas where things can go wrong:

  • the connection between Jira and the SMTP server
  • the Mail Queue Service
  • the Jira Batched notification job
  • the Service Desk customer notification job

Before conducting a complex investigation which would involve digging into the logs, analyzing thread dumps and running database queries, it is important to first ask yourself some preliminary questions. These questions which will help you focus on the area of the entire notification workflow, and are summed up in the diagram below:


After you went through this diagram, move on to the right section of this KB article pointed out by the diagram:

Troubleshooting SMTP mail server issues


If the issue lies between the Jira application and the SMTP mail server, then if you go to the page ⚙ > System > Mail Queue, you should see emails piling up in the error queue, or in the mail queue with the pink background color as shown below:

There can be various reasons why the Mail Queue Service is unable to sent the notification email via the SMTP server. Such reasons include, but are not limited to:

  • Network connectivity issue between the Jira server and the SMTP Mail server
  • Mail throttling configuration on the SMTP Mail Server side
  • SSL configuration issue on Jira side
  • Anti-virus or Firewall blocking traffic between the Jira server and the SMTP Mail Server

Look for errors in the <JIRA_HOME>/log/atlassian-jira-outgoing-mail.log file and check if you find errors matching the ones listed below. (warning) Please note that these errors are only relevant if they occur often in the logs and are thrown with most emails. If you only see these errors thrown occasionally, then these errors might be false positive:

com.atlassian.mail.MailException: com.sun.mail.smtp.SMTPSendFailedException: 421 4.4.2 Message submission rate for this client has exceeded the configured limit
java.net.SocketTimeoutException: Read timed out
Too many login attempts, please try again later
com.atlassian.mail.MailException: com.sun.mail.util.MailConnectException: Couldn't connect to host, port: smtp.office365.com, 587; timeout -1;
      nested exception is:
        java.net.ConnectException: Connection timed out (Connection timed out)
        at com.atlassian.mail.server.impl.SMTPMailServerImpl.sendWithMessageId(SMTPMailServerImpl.java:222) [atlassian-mail-5.0.0.jar:?]
com.atlassian.mail.MailException: com.sun.mail.smtp.SMTPSendFailedException: 530 5.7.0 Must issue a STARTTLS command first. 7sm25921297qkx.49 - gsmtp
	at com.atlassian.mail.server.impl.SMTPMailServerImpl.sendWithMessageId(SMTPMailServerImpl.java:225) [atlassian-mail-2.7.18.jar:?]
com.atlassian.mail.MailException: javax.mail.MessagingException: Could not convert socket to TLS;
      nested exception is:
    	java.io.IOException: Can't verify identity of server: <SERVER_NAME>
com.atlassian.mail.MailException: javax.mail.MessagingException: Could not convert socket to TLS;
      nested exception is:
    	javax.net.ssl.SSLHandshakeException: sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target
Caused by: javax.mail.MessagingException: Could not connect to SMTP host: smtp.gmail.com, port: 587;
nested exception is:
javax.net.ssl.SSLException: Unrecognized SSL message, plaintext connection?
at com.sun.mail.smtp.SMTPTransport.openServer(SMTPTransport.java:2056) [javax.mail-1.5.4.jar:1.5.4]
at com.sun.mail.smtp.SMTPTransport.protocolConnect(SMTPTransport.java:697) [javax.mail-1.5.4.jar:1.5.4]
at javax.mail.Service.connect(Service.java:386) [javax.mail-1.5.4.jar:1.5.4]
at javax.mail.Service.connect(Service.java:245) [javax.mail-1.5.4.jar:1.5.4]
at javax.mail.Service.connect(Service.java:194) [javax.mail-1.5.4.jar:1.5.4]
at com.atlassian.mail.server.impl.SMTPMailServerImpl.sendWithMessageId(SMTPMailServerImpl.java:174) [atlassian-mail-2.5.16.jar:?]
... 25 more


If you see any of these errors thrown frequently in the logs, then one of the KB article listed below might apply:

Troubleshooting Mail Queue Service issues

If the problem lies with the Mail Queue Service, you should see at least one of the symptoms listed below, which are an indication that the Mail Queue service is either stuck, or running too slowly, or not scheduled on time:

  • A high number of emails are piling up in the mail queue in ⚙ > System > Mail Queue as shown in the image below:
  • The size of mail queue is not decreasing automatically, or it's decreasing very slowly
  • The mail queue gets emptied when manually flushing it

There can be various reasons why the mail queue service is not running as expected. The most common root causes are described below.

tip/resting Created with Sketch.

To track the state of email sending in Jira applications, you can also use the health check for mail error queues. The health check is available with the ATST plugin 1.53.2 and later.

Root cause 1 - Mail Queue Service Configuration

The Mail Queue Service is scheduled to be running by default at once per minute. But sometimes due to user preference, it can be altered to different delay and this will take direct effect on how long it will be delayed to reach the end user.

This can be verify by navigating to ⚙ > System > Advanced > Services and checking the "Schedule" value:

  • The default value is 0 * * * * ?  which means every minute
  • If you observe a different cron expression than the default value, then it is possible that it was modified to run less frequently than every minute. In this case, try to change the expression to the default one and check if the issue is resolved

Root cause 2 - The Jira mail handlers are using most of the Caesium threads

We have also seen issues where Incoming Mail Handlers take a long time to scan mailboxes connected to Jira due to the mailbox housing a very large amount of messages. This causes the Caesium threads to not be released as often as they should, increasing the time it takes for the Mail Queue Service to run.

To verify if the Mail Queue Service is not executed because the execution time of mail handlers, please refer to the KB article below:

Jira notifications piling up in the mail queue due to mail handlers using too many Caesium threads

Root cause 3 - Outgoing Mail Server configured with and infinite timeout value

When the Outgoing Mail Server (SMTP mail server) is configured with a connection timeout set to 0 or no connection timeout set at all, the Mail Queue Service is waiting infinitely to get a connection from SMTP server. If for any reason the connection is unstable while the mail queue is in the middle of sending a notification, it might get stuck in that infinite state. The consequence of that is that the mail queue service is no longer scheduled every minute, and the mail queue indefinitely piles up.

To verify if the SMTP server timeout value is preventing the mail queue service from running, please refer to the KB article below:

Jira notifications piling up in the mail queue due to the SMTP server infinite timeout setting

Root Cause 4 - Jira server hostname DNS resolution issue

Every time  the Mail Queue Service attempts to send an email, it will perform a reverse DNS lookup for the Jira application server hostname. If the hostname isn't reachable, Jira application will have to wait for for a timeout which can be a long period of time (20-40 seconds).

If you make the observations listed below, then this root cause might be relevant:

  • the mail queue is automatically emptied very slowly
  • when manually flushing the queue, the mail queue is still emptied slowly
  • it takes a long time for each email to be sent (for example 20-40 seconds) as per the <JIRA_HOME>/log/atlassian-jira-outgoing-mail.log file (after logging and debugging was enabled for Outgoing Mail):

    grep 'was sent with Message-Id' atlassian-jira-outgoing-mail.log:
    
    2020-07-27 04:53:41,256+0200 DEBUG [] Sending mailitem To='email1@test.com' Subject='Updated: (ABC-123)' From='null' FromName='Test User (Jira)' Cc='null' Bcc='null' ReplyTo='null' InReplyTo='null' MimeType='text/html' Encoding='UTF-8' Multipart='javax.mail.internet.MimeMultipart@24d56930' MessageId='null' ExcludeSubjectPrefix=false' anonymous    Mail Queue Service Message was sent with Message-Id <JIRA.428775.1594212067000.32599.1595818401102@Atlassian.JIRA>
    2020-07-27 04:54:01,456+0200 DEBUG [] Sending mailitem To='email2@test.com' Subject='Updated: (ABC-123)' From='null' FromName='Test User (Jira)' Cc='null' Bcc='null' ReplyTo='null' InReplyTo='null' MimeType='text/html' Encoding='UTF-8' Multipart='javax.mail.internet.MimeMultipart@3b62f1d5' MessageId='null' ExcludeSubjectPrefix=false' anonymous    Mail Queue Service Message was sent with Message-Id <JIRA.428775.1594212067000.32600.1595818441274@Atlassian.JIRA>
    2020-07-27 04:54:21,639+0200 DEBUG [] Sending mailitem To='email3@test.com' Subject='Updated: (ABC-123)' From='null' FromName='Test User (Jira)' Cc='null' Bcc='null' ReplyTo='null' InReplyTo='null' MimeType='text/html' Encoding='UTF-8' Multipart='javax.mail.internet.MimeMultipart@19310596' MessageId='null' ExcludeSubjectPrefix=false' anonymous    Mail Queue Service Message was sent with Message-Id <JIRA.428775.1594212067000.32601.1595818481472@Atlassian.JIRA>

In this case, one of the 2 knowledge base articles below might apply:

Jira notifications piling up in the mail queue due to IPv6 issues on JVM
Jira notifications piling up in the mail queue due to a server hostname resolution issue

Root Cause 5 - Group Filter subscriptions sent to big list of users, or high number of private filter subscriptions

The Mail Queue Service might frequently get stuck, as it is busy queueing and sending a large number of filter subscription emails.

To verify if you are impacted by this situation, please check the following knowledge base article:

Jira notifications sent with a long delay due group filter subscriptions or high number of private filter subscriptions

Root Cause 6 - Some 3rd party add-ons are using most of the Caesium threads

The Mail Queue Service relies on Caesium threads to be scheduled every 1 min (by default). If any of the add-ons listed below is installed in your Jira instance, then there is a chance that this add-on might be constantly using all the Caesium threads, preventing the Mail Queue Service (and any other Jira service) from running on schedule:

To verify if you are impacted by this scenario, please check the following knowledge base article:

Jira incoming mail and outgoing mail functionalities are not running due to an add-on taking all the Caesium threads

Root Cause 7 - Network slowness between Jira and the SMTP Mail server, or slowness on the SMTP Mail server side

If there is any network slowness or slowness on the SMTP mail server side, the size of the Mail Queue might increase quickly, as emails we will be sent very slowly.

To know if this root cause is relevant, check if you can observe both symptoms below:

  • Collect Thread Dumps, and check if you can see a long running thread showing that the Mail Queue is taking time to sent an email

    "Sending mailitem PostprocessingMailQueueItem{delegate=To='someuser@test.com' Subject='ABC-123 Some Issue' From='null' FromName='Chalupa, Petr (Jira)' Cc='null' Bcc='null' ReplyTo='null' InReplyTo='null' MimeType='text/html' Encoding='UTF-8' Multipart='javax.mail.internet.MimeMultipart@5baf200d' MessageId='null' ExcludeSubjectPrefix=false'}" daemon prio=5 tid=0x0000000000000465 nid=0 runnable 
       java.lang.Thread.State: RUNNABLE
    	at java.net.SocketInputStream.socketRead0(Native Method)
    	at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
    	at java.net.SocketInputStream.read(SocketInputStream.java:171)
    	at java.net.SocketInputStream.read(SocketInputStream.java:141)
    	at com.sun.mail.util.TraceInputStream.read(TraceInputStream.java:102)
    	at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
    	at java.io.BufferedInputStream.read(BufferedInputStream.java:265)
    	- locked <0x00000000465e0014> (a java.io.BufferedInputStream)
    	at com.sun.mail.util.LineInputStream.readLine(LineInputStream.java:100)
    	at com.sun.mail.smtp.SMTPTransport.readServerResponse(SMTPTransport.java:2456)
    
    ...
    
    	at com.atlassian.jira.mail.JiraMailQueue$1.apply(JiraMailQueue.java:51)
    	at com.atlassian.jira.mail.JiraMailQueue$1.apply(JiraMailQueue.java:48)
  • Enable DEBUG for Outgoing Mail in ⚙ > System > Logging and profiling, and check how long emails take to be sent. For example, in the example below, we can see that it takes more than 1 min for an email to be sent (there is a gap of >1min after the line mentioning "Sending message"):

    2023-02-06 15:27:43,416+0200 DEBUG [] Sending mailitem To='someuser@test.com' Subject='ABC-123 Some Issue' From='null' FromName='Some User' Cc='null' Bcc='null' ReplyTo='null' InReplyTo='null' MimeType='text/plain' Encoding='UTF-8' Multipart='javax.mail.internet.MimeMultipart@2c754c73' MessageId='null' ExcludeSubjectPrefix=true' anonymous    Mail Queue Service [c.atlassian.mail.outgoing] Getting transport for protocol [smtp]
    2023-02-06 15:27:43,416+0200 DEBUG [] Sending mailitem To='someuser@test.com' Subject='ABC-123 Some Issue' From='null' FromName='Some User' Cc='null' Bcc='null' ReplyTo='null' InReplyTo='null' MimeType='text/plain' Encoding='UTF-8' Multipart='javax.mail.internet.MimeMultipart@2c754c73' MessageId='null' ExcludeSubjectPrefix=true' anonymous    Mail Queue Service [c.atlassian.mail.outgoing] Got transport: [smtp://jira@test.com]. Connecting
    2023-02-06 15:27:43,429+0200 DEBUG [] Sending mailitem To='someuser@test.com' Subject='ABC-123 Some Issue' From='null' FromName='Some User' Cc='null' Bcc='null' ReplyTo='null' InReplyTo='null' MimeType='text/plain' Encoding='UTF-8' Multipart='javax.mail.internet.MimeMultipart@2c754c73' MessageId='null' ExcludeSubjectPrefix=true' anonymous    Mail Queue Service [c.atlassian.mail.outgoing] Sending message
    2023-02-06 15:28:51,463+0200 DEBUG [] Sending mailitem To='someuser@test.com' Subject='ABC-123 Some Issue' From='null' FromName='Some User' Cc='null' Bcc='null' ReplyTo='null' InReplyTo='null' MimeType='text/plain' Encoding='UTF-8' Multipart='javax.mail.internet.MimeMultipart@2c754c73' MessageId='null' ExcludeSubjectPrefix=true' anonymous    Mail Queue Service [c.atlassian.mail.outgoing] Message was sent with Message-Id <JIRA.2999030.1675684169000.182.1675690063415@Atlassian.JIRA>

If both symptoms listed above are observed, then this root cause is probably relevant.

Root Cause 8 - High Jira activity triggering huge amount of emails that the SMTP server or the network cannot handle

Ideally, in a very healthy network environment, the Mail Queue should be able to send each email within 200 msec, which is 5 emails per second and 300 emails per minute. If the Jira application has a strong ticket activity from thousands of users, it is possible that more than 300 emails will be triggered per minute. If the network speed is not ideal (mails take more than 200 msec to be delivered to the SMTP server), and if the number of emails is too high, it is expected that emails will pile up in the Mail Queue.

The resolution is unfortunately not simple, as it consists in:

  • either improving the network stability/speed
  • or reducing the amount of issue activity in Jira
  • or enabling the Jira Notification Batching, as it will reduce the amount of emails triggered from any issue (only available in Jira 8.0.0 and higher versions)

Root Cause 9 - A very long text field was updated in a Jira ticket

If a text field of huge size (hundreds of thousands of characters) was edited in a Jira issue, then the Mail Queue service might get stuck while trying to build a Jira notification showing what was changed in this field and we are hitting a known Jira bug. To verify if you are impacted by this bug and to check the potential workarounds, please refer to the following public bug ticket:

JRASERVER-65963 - Getting issue details... STATUS

Root Cause 10 - Mail Queue Service does does not recover from a database connectivity issue

If Jira experiences a temporary database connectivity issue (for example, during a DB server maintenance or re-start), some scheduled jobs such as the Mail Queue Service might never recover from this, and the scheduler will never try to execute them.

In this case, you might be impacted by the following public bug:

JRASERVER-62072 - Getting issue details... STATUS

To verify if this bug is the reason why the Mail Queue Service is not running, please refer to the KB article below:

Jira/Service Managements notifications are piling up in the mail queue due to the bug JRASERVER-62072

Root Cause 11 - Too many instances of the Bamboo Service

If you are running Jira on a version lower than 8.2.2, then you might be impacted by the bug below:
JRASERVER-66593 - Getting issue details... STATUS

Due to this bug, after every Jira re-start, a new duplicate entry for the job com.atlassian.jira.plugin.ext.bamboo.service.PlanStatusUpdateJob will be added to the clusteredjob table. Since all scheduled jobs share the same resources (4 Caesium threads), if this job clogs the clusteredjob table, then other jobs such as the Mail Queue Service may never have a chance to run.

Running the following SQL query will help validate this root cause:

select count (*) from clusteredjob
WHERE job_runner_key = 'com.atlassian.jira.plugin.ext.bamboo.service.PlanStatusUpdateJob';

If this query returns a high number of duplicate rows, then this root cause might be relevant.

Troubleshooting Jira Notification issues (Batched and Non-Batched)


If you verified that none of the Jira notifications are sent whether the batched notifications settings (in the page ⚙ > System > Batching Notifications) is enabled or not, then it means that the entire Jira notification module is dysfunctional.

Only 1 root cause has been identified so far.

Root Cause - The Insight add-on is disabled due to an incorrect upgrade path

If the Jira notifications (batched and non-batched) stopped working after upgrading Jira to 8.16.0 or higher (or Service Management 4.16.0 or higher), you might be impacted by the issue described in the KB article below:

Jira Mail Notifications fail if Insight plugin is disabled due to incorrect upgrade path

Troubleshooting Jira Batched Notification mail issues

If you verified that Jira notifications are properly sent after you disable batched notifications in the page ⚙ > System > Batching Notifications, then chances are that the jobs responsible for the Jira batched notifications are failing to be executed, or face some slowness.

There can be various reasons why the Jira batched notifications job is not running as expected. The most common root causes are listed below.

Root Cause 1 - Expected behavior

If you are observing a constant delay with the Jira notifications (for example: 30 min, 10 min, etc...), then this delay is expected if you have enabled batched-notifications in the page ⚙ > System > Batching email notifications is enabled. Batched notifications can be sent at different frequencies on this page, and as quickly as every 2 minutes.

Please note that the delay observed with the reception of the notifications might be slightly longer than the frequency that was set. For example, if the frequency is set to 10 minutes, notifications will be sent about 12-15 minutes after the event happened. This is expected, due to the way batched notifications were implemented in the Jira code.

Root Cause 2 - MSSQL Database configuration issue in Jira

If you observe that Jira Batched Notifications are not sent at all and if you are using a MSSQL database, you might be impacted by the issue described in the KB article below:

Batched notifications are not working when using MS SQL database

Root Cause 3 - Oracle Database driver incompatibility with Jira

If you observe that Jira Batched Notifications are not sent at all and if you are using an Oracle database, you might be impacted by the issue described in the KB article below:

Batch notifications are not working when using Oracle database

Root Cause 4 - Performance issue impacting the batched notification jobs when Jira is connected to a MSSQL or Oracle Database

If you observe that Batched notifications are intermittently sent with a long delay (for example 40 minutes, or several hours), you might be impacted by the performance bug below:

JRASERVER-71350 - Getting issue details... STATUS

Root Cause 5 - Batched notification job does does not recover from a database connectivity issue (Jira Server and Data Center - Applies to both single node and multi-node environments)

When using Jira Server or Data Center on one single node:

  • If Jira experiences a temporary database connectivity issue (for example, during a DB server maintenance or re-start), some scheduled jobs such as the Batched Notification job might never recover from this, and the scheduler will never try to execute them.
  • In this case, you might be impacted by the bug listed below:

JRASERVER-62072 - Getting issue details... STATUS

Root Cause 6 - Batched notification job does does not recover from a database connectivity issue (Jira Data Center multi-nodes only - Does not apply to Jira Server nor Data Center single node)

When using Jira Data Center on multiple nodes:

  • whenever a node from the cluster starts a job (for example, the batched notification job), a lock is set in the database in the table clusterlockstatus, used to prevent any other node from running the same job

  • when the node completes its job, this node is supposed to release the lock from the clusterlockstatus table, so that other nodes can run the job in the future

  • if the node fails to release the lock (for example, due to database connection problem), the node does not try again to unlock it, and the job remains locked until the next re-start of all the JIRA nodes

If you are using Jira Data Center on a version lower than 8.3.0, then you might be impacted by the bug below:

JRASERVER-66597 - Getting issue details... STATUS

Root Cause 7 - Batched notification job gets stuck after a high number of watchers were accidentally added to a Jira issue

After accidentally adding a very high number of users to the watcher list of a ticket (for example, hundreds of thousands), the Jira Batched Notifications might stop being sent to users (or they might be sent with hours of delay).

To verify if you are impacted by this scenario, please check the following knowledge base article:

Jira Batched Notifications stop being sent from any project after adding a big number of watchers to a ticket

Root Cause 8 - Batched notification job gets stuck after a big size comment was edited in a ticket

In some conditions, if a comment of very big size (for example < 300k characters) is edited in a ticket, the Jira Batched Notifications might stop being sent to users (or they might be sent with hours of delay).

To verify if you are impacted by this scenario, please check the following knowledge base article:

Jira Batched Notifications stop being sent from any project after a big size comment was edited in a ticket

Root Cause 9 - The Batched notification job keeps being executed on the same node (Jira Data Center multi-nodes only - Does not apply to Jira Server nor Data Center single node)

Due to the bug https://jira.atlassian.com/browse/JRASERVER-75733, under some unknown conditions, the Batched Notification job might be constantly executed by the same Jira node, instead of being evenly/randomly executed by all the nodes in the cluster (and regardless if the user activity is evenly distributed across nodes by the Load Balancer).

In such case, the following will happen:

  • all the Batched Notification emails will end up on the Mail Queue of the same node
  • in case of a busy Jira instance, the emails will be sent with a long delay, since all emails are being sent by 1 single node, instead of being sent by by multiple nodes

To check if this root cause might be relevant:

  • Log into each node's Mail Queue page ⚙ > System > Mail Queue by using each node's IP address and bypassing the Load Balancer
    • if the Mail Queue is piling up on only 1 node, then it's an indication that this root cause might be relevant
  • Check the Jira Outgoing Mail Logs from the home folder of each node
    • If you see logs like the ones shown below recorded on only 1 node, then it's another indication that this root cause might be relevant:

      Log snippet
      2023-11-10 09:34:44,626-0500 INFO [] Caesium-1-3 ServiceRunner     [c.a.m.o.c.a.j.p.i.batching.cron.MemorySafeEventRetriever] Searching for scheduled events to notify and using queryTimeout of null seconds.
      2023-11-10 09:34:44,718-0500 INFO [] Caesium-1-3 ServiceRunner     [c.a.m.o.c.a.j.p.i.batching.cron.MemorySafeEventRetriever] Searching for flushed/mention events to notify and using queryTimeout of null seconds.
      2023-11-10 09:34:44,725-0500 INFO [] Caesium-1-3 ServiceRunner     [c.a.m.o.c.a.j.p.i.batching.cron.MemorySafeEventRetriever] Searching for flushed/mention events to notify and using queryTimeout of null seconds.
      2023-11-10 09:35:44,627-0500 INFO [] Caesium-1-4 ServiceRunner     [c.a.m.o.c.a.j.p.i.batching.cron.MemorySafeEventRetriever] Searching for scheduled events to notify and using queryTimeout of null seconds.
      2023-11-10 09:35:44,647-0500 INFO [] Caesium-1-4 ServiceRunner     [c.a.m.o.c.a.j.p.i.batching.cron.MemorySafeEventRetriever] Searching for flushed/mention events to notify and using queryTimeout of null seconds.
      2023-11-10 09:36:44,625-0500 INFO [] Caesium-1-4 ServiceRunner     [c.a.m.o.c.a.j.p.i.batching.cron.MemorySafeEventRetriever] Searching for scheduled events to notify and using queryTimeout of null seconds.
      2023-11-10 09:37:44,648-0500 INFO [] Caesium-1-2 ServiceRunner     [c.a.m.o.c.a.j.p.i.batching.cron.MemorySafeEventRetriever] Searching for scheduled events to notify and using queryTimeout of null seconds.
      2023-11-10 09:38:44,627-0500 INFO [] Caesium-1-3 ServiceRunner     [c.a.m.o.c.a.j.p.i.batching.cron.MemorySafeEventRetriever] Searching for scheduled events to notify and using queryTimeout of null seconds.
      2023-11-10 09:38:44,650-0500 INFO [] Caesium-1-3 ServiceRunner     [c.a.m.o.c.a.j.p.i.batching.cron.MemorySafeEventRetriever] Searching for flushed/mention events to notify and using queryTimeout of null seconds.
      2023-11-10 09:38:44,697-0500 INFO [] Caesium-1-3 ServiceRunner     [c.a.m.o.c.a.j.p.i.batching.cron.MemorySafeEventRetriever] Searching for flushed/mention events to notify and using queryTimeout of null seconds.
      2023-11-10 09:38:44,701-0500 INFO [] Caesium-1-3 ServiceRunner     [c.a.m.o.c.a.j.p.i.batching.cron.MemorySafeEventRetriever] Searching for flushed/mention events to notify and using queryTimeout of null seconds.
      2023-11-10 09:39:44,628-0500 INFO [] Caesium-1-3 ServiceRunner     [c.a.m.o.c.a.j.p.i.batching.cron.MemorySafeEventRetriever] Searching for scheduled events to notify and using queryTimeout of null seconds.
      2023-11-10 09:39:44,643-0500 INFO [] Caesium-1-3 ServiceRunner     [c.a.m.o.c.a.j.p.i.batching.cron.MemorySafeEventRetriever] Searching for flushed/mention events to notify and using queryTimeout of null seconds.
      2023-11-10 09:39:44,649-0500 INFO [] Caesium-1-3 ServiceRunner     [c.a.m.o.c.a.j.p.i.batching.cron.MemorySafeEventRetriever] Searching for flushed/mention events to notify and using queryTimeout of null seconds.
      2023-11-10 09:39:44,654-0500 INFO [] Caesium-1-3 ServiceRunner     [c.a.m.o.c.a.j.p.i.batching.cron.MemorySafeEventRetriever] Searching for flushed/mention events to notify and using queryTimeout of null seconds.
      2023-11-10 09:40:44,631-0500 INFO [] Caesium-1-3 ServiceRunner     [c.a.m.o.c.a.j.p.i.batching.cron.MemorySafeEventRetriever] Searching for scheduled events to notify and using queryTimeout of null seconds.


Troubleshooting Service Desk Customer notification mail issues

If you verified that Jira notifications are sent as expected, but only Service Desk Customer Notifications are not sent (or sent with a long delay), then chances are that the job responsible for the customer notification is failing to be executed, or is facing some slowness.

There can be various reasons why the customer notification job is not running as expected. The most common root causes are listed below.

Root Cause 1 - Customer notification are delayed due complex SLA configurations

The Service Management SLAs, Customer Notifications and Automations all share the same threads (SdOffThreadEventJobRunner:thread). Because of that, if the Service Desk project contains SLAs configured with a long list of goals using complex JQL queries, we might experience the following symptoms:

  • delays and inconsistencies with the SLA displayed in the tickets
  • delays with the triggering of Automations
  • delays with the Customer Notifications

This root cause is detailed in the KB article below:

Service Management customer notifications are sent with a long delay due to complex SLA configurations

Root Cause 2 - Customer notification job does does not recover from a database connectivity issue (Jira Server and Data Center - Applies to both single node and multi-node environment)

When using Jira Server or Data Center on one single node:

  • If Jira experiences a temporary database connectivity issue (for example, during a DB server maintenance or re-start), some scheduled jobs such as the Customer Notification job might never recover from this, and the scheduler will never try to execute it.
  • In this case, you might be impacted by the bug listed below:

JRASERVER-62072 - Getting issue details... STATUS

Root Cause 3 - Customer notification job does does not recover from a database connectivity issue (Jira Data Center multi-nodes only - Does not apply to Jira Server nor Data Center single node)

When using Jira Data Center on multiple nodes:

  • whenever a node from the cluster starts a job (for example, the batched notification job), a lock is set in the database in the table clusterlockstatus, used to prevent any other node from running the same job

  • when the node completes its job, this node is supposed to release the lock from the clusterlockstatus table, so that other nodes can run the job in the future

  • if the node fails to release the lock (for example, due to database connection problem), the node does not try again to unlock it, and the job remains locked until the next re-start of all the Jira nodes

If you are using Jira Data Center on a version lower than 8.3.0, then you might be impacted by the bug below:

JRASERVER-66597 - Getting issue details... STATUS

If a comment including huge number of links (100k), the job responsible for the Customer notifications might get completely stuck. In such case:

  • re-starting the Jira application won't fix the issue
  • the following long running thread might be found while collecting thread dumps:

    "Caesium-2-1" #20668 daemon prio=5 tid=0x0000000004770000 nid=0x5b42 runnable [0x00007f740d93f000]
       java.lang.Thread.State: RUNNABLE
    	at java.lang.String.indexOf(String.java:1769)
    	at java.lang.String.indexOf(String.java:1718)
    	at org.apache.commons.lang.StringUtils.replaceEach(StringUtils.java:4075)
    	at org.apache.commons.lang.StringUtils.replaceEach(StringUtils.java:3868)
    	at com.atlassian.servicedesk.internal.feature.customer.request.IssueUrlConverterImpl.replaceIssueUrlsWithPortalRequestUrls(IssueUrlConverterImpl.java:69)
    	at com.atlassian.servicedesk.internal.feature.customer.request.CustomerTextRendererImpl.updateCustomerTextIntertal(CustomerTextRendererImpl.java:159)
    	at com.atlassian.servicedesk.internal.feature.customer.request.CustomerTextRendererImpl.updateEmailTextForCustomer(CustomerTextRendererImpl.java:154)
    	at com.atlassian.servicedesk.internal.notifications.render.StylingBodyFinaliserImpl.buildMultiPartHtmlEmailBody(StylingBodyFinaliserImpl.java:79)
    	at com.atlassian.servicedesk.internal.notifications.render.StylingBodyFinaliserImpl.buildMessageBodyForRecipient(StylingBodyFinaliserImpl.java:72)
    	at com.atlassian.servicedesk.internal.notifications.render.StylingBodyFinaliserImpl.lambda$buildHtmlBody$0(StylingBodyFinaliserImpl.java:55)
    	at com.atlassian.servicedesk.internal.notifications.render.StylingBodyFinaliserImpl$$Lambda$2582/1308621309.apply(Unknown Source)
    	at io.atlassian.fugue.Either$RightProjection.map(Either.java:872)
    	at io.atlassian.fugue.Either.map(Either.java:217)
    	at com.atlassian.servicedesk.internal.notifications.render.StylingBodyFinaliserImpl.buildHtmlBody(StylingBodyFinaliserImpl.java:55)

If you are using Service Management version lower than 4.5.0, then you might be impacted by the bug listed below:

JSDSERVER-6516 - Getting issue details... STATUS

Root Cause 5 - Customer notification job gets stuck when a ticket is shared with thousands of participants

If a Service Management request is shared with a high number of participants, the customer notification job might get stuck and stop sending any customer notifications. In such case, re-starting the Jira application won't fix the issue.

This behavior is caused by a Service Management bug tracked in the link below:
JSDSERVER-7346 - Getting issue details... STATUS

Root Cause 6 - Customer notification job gets stuck processing tokens from the cwd_user_attributes table

Jira Service Management (JSM) introduced a login free portal in the version 5.3.0. To implement this feature, a token validation process was introduced in this version. This process can negatively impact the performance of the Customer Notification functionality and delay notifications by hours.

This scenario is caused by a Service Management bug tracked in the link below:
JSDSERVER-12279 - Getting issue details... STATUS

If you are observing long delays with customer notifications (hours/days) after upgrading JSM to 5.3.0 or any higher version, then this root cause might be relevant.

Providing data to Atlassian Support

If you were not able to identify what is causing the notifications to be stuck or delayed, please reach out to Atlassian Support via this link.

To help the Atlassian support team investigate the issue faster, you can follow the steps below and attach all the collected data to the support ticket raised with Atlassian:

  1. Enable some additional debugging packages by following the steps below:
    1. Go to the page ⚙ > System > Logging and Profiling
    2. Click on Enable on outgoing mail logging
    3. Then underneath that, Click Enable Debugging
    4. From the same page, click Configure logging level for another package
    5. Use com.atlassian.jira.service as the package name, and select "DEBUG" for the "Logging Level"
    6. Repeat the same step above, but this time with the 2 packages com.atlassian.servicedesk.plugins.notifications and com.atlassian.jira.plugins.inform.batching.cron.BatchNotificationJob
  2. Wait for about 30 min, so that we can collect enough logs
  3. Generate a support zip by making sure to tick the option Thread Dumps when you generate it.
    1. (warning) To include the thread dumps when creating the support zip, go to ⚙ > System > Troubleshooting and support tools > Create Support zip > Customize Zip, tick the Thread dumps option, and click on the Save button
    2. (warning) If you are using Jira Data Center, please generate a support zip from each Jira node
  4. Collect the following screenshots:
    1. The page ⚙ > System > Outgoing Mail after clicking on the Edit button
    2. The page ⚙ > System > Mail Queue
    3. The page ⚙ > System > Batching email notifications
  5. Run the following SQL Query against the Jira Database:

    select * from rundetails where job_id in (
    'com.atlassian.jira.service.JiraService:10000',
    'com.atlassian.jira.plugins.inform.batching.cron.BatchNotificationJobSchedulerImpl',
    'sd.custom.notification.batch.send');
  6. Attach to the support ticket:
    1. The screenshots
    2. The support zip
    3. The result from the SQL query




Description

This page covers the steps needed to resolve the following issue. The notification mail for all events are slow to reach the users, sometimes can reach up to hours to reach the intended recipient. No warning or errors is spotted on the mail queue, mails are leaving gradually but slowly.

ProductJira



Last modified on Jan 16, 2024

Was this helpful?

Yes
No
Provide feedback about this article
Powered by Confluence and Scroll Viewport.