Platform Notice: Data Center - This article applies to Atlassian products on the Data Center platform.

Note that this knowledge base article was created for the Data Center version of the product. Data Center knowledge base articles for non-Data Center-specific features may also work for Server versions of the product, however they have not been tested. Support for Server* products ended on February 15th 2024. If you are running a Server product, you can visit the Atlassian Server end of support announcement to review your migration options.

*Except Fisheye and Crucible

Problem

The main symptom noticed by customers is that the outgoing mail function does not work, when the mail queue is flushed, the emails are sent without seeing any problems.

At many times also other Jira services including ones from 3rd party plugins might not be working or getting unusual long delays not consistent with the scheduled times for them.

No errors appear in the logs.

Diagnosis

Diagnostic Steps

We need to look into 3 places to diagnose this problem

Thread dumps: Thread dumps will show that all the Jira scheduler jobs are always busy all the time. Jira uses Caesium as a scheduler and Jira has only 4 Caesium threads that execute Jira services and looking at the thread dumps for what those threads are doing, can help us identify the issue. The dumps will show something like the below:

"Caesium-1-1" #189 daemon prio=5 os_prio=0 tid=0x0000000039af1000 nid=0x19b0 runnable [0x000000006103e000]
   java.lang.Thread.State: RUNNABLE
        at sun.nio.ch.SocketDispatcher.read0(Native Method)
        at oracle.jdbc.driver.T4CPreparedStatement.doOall8(T4CPreparedStatement.java:226)
        at org.ofbiz.core.entity.jdbc.SQLProcessor.executeUpdate(SQLProcessor.java:562)
        at org.ofbiz.core.entity.GenericDAO.singleInsert(GenericDAO.java:198)
        at org.ofbiz.core.entity.GenericDAO.insert(GenericDAO.java:171)
        at com.atlassian.jira.ofbiz.DefaultOfBizDelegator.createValue(DefaultOfBizDelegator.java:296)
        at com.atlassian.jira.ofbiz.WrappingOfBizDelegator.createValue(WrappingOfBizDelegator.java:172)
        at com.atlassian.jira.entity.EntityEngineImpl.createValue(EntityEngineImpl.java:29)
        at com.atlassian.jira.scheduler.OfBizRunDetailsDao.addRunDetails(OfBizRunDetailsDao.java:112)
        at com.atlassian.scheduler.core.AbstractSchedulerService.addRunDetails(AbstractSchedulerService.java:151)
        at com.atlassian.scheduler.core.JobLauncher.launch(JobLauncher.java:98)
        at com.atlassian.scheduler.caesium.impl.CaesiumSchedulerService.launchJob(CaesiumSchedulerService.java:443)
        at com.atlassian.scheduler.caesium.impl.CaesiumSchedulerService.executeClusteredJob(CaesiumSchedulerService.java:438)
        at com.atlassian.scheduler.caesium.impl.SchedulerQueueWorker.executeNextJob(SchedulerQueueWorker.java:59)
        at com.atlassian.scheduler.caesium.impl.SchedulerQueueWorker.run(SchedulerQueueWorker.java:34)
        at java.lang.Thread.run(Thread.java:748)

As can be seen the Caesium thread is not doing much just check the database to get the next job to be executed.

All 4 Caesium threads will show similar stacktraces, doing database checks looking for next job.

In normal cases, Caesium threads should either be idle or busy executing Jira Service / job related code.

SQL Logs: We need to enable Jira SQL logging and check what kind of queries are being executed by the Caesium Jobs. The logs will show that the Caesium jobs are doing a lot of queries, both selects and updates on the clusteredjob table. The queries are very generic, mainly to allow the Caesium to figure out which Job to run next. The logs look like below:

$ grep "Caesium" atlassian-jira-sql.log* |grep Caesium-1-2 |grep -iv Connection|grep c3458b07-7b07-49d8-b2a9-efed1509d27f
atlassian-jira-sql.log:2018-11-04 21:21:48,422 Caesium-1-2 ServiceRunner    0ms "SELECT ID, JOB_ID, JOB_RUNNER_KEY, SCHED_TYPE, INTERVAL_MILLIS, FIRST_RUN, CRON_EXPRESSION, TIME_ZONE, NEXT_RUN, VERSION, PARAMETERS FROM clusteredjob WHERE JOB_ID='c3458b07-7b07-49d8-b2a9-efed1509d27f'"
atlassian-jira-sql.log:2018-11-04 21:21:48,438 Caesium-1-2 ServiceRunner    16ms "UPDATE clusteredjob SET  VERSION = '659053' ,  NEXT_RUN = '1541362968422'  WHERE JOB_ID='c3458b07-7b07-49d8-b2a9-efed1509d27f' AND VERSION='659052'"
atlassian-jira-sql.log:2018-11-04 21:21:48,438 Caesium-1-2 ServiceRunner    0ms "SELECT DISTINCT ID FROM rundetails WHERE JOB_ID='c3458b07-7b07-49o8-b2a9-efed87659d27f'"
atlassian-jira-sql.log:2018-11-04 21:21:48,469 Caesium-1-2 ServiceRunner    31ms "INSERT INTO rundetails (ID, JOB_ID, START_TIME, RUN_DURATION, RUN_OUTCOME, INFO_MESSAGE) VALUES ('391194760', 'c3458b07-7b07-49d8-b2a9-efed1509d27f', '2018-11-04 21:21:48.422', '16', 'S', '')"
....
 $ grep "Caesium" atlassian-jira-sql.log* |grep Caesium-1-2 |grep -iv Connection|grep 47d35268-25c4-4655-bb8d-f321af5876cf
atlassian-jira-sql.log.1:2018-11-04 21:10:31,014 Caesium-1-2 ServiceRunner    0ms "SELECT ID, JOB_ID, JOB_RUNNER_KEY, SCHED_TYPE, INTERVAL_MILLIS, FIRST_RUN, CRON_EXPRESSION, TIME_ZONE, NEXT_RUN, VERSION, PARAMETERS FROM clusteredjob WHERE JOB_ID='47d35268-25c4-4655-bb8d-f321af5876cf'"
atlassian-jira-sql.log.1:2018-11-04 21:10:31,061 Caesium-1-2 ServiceRunner    47ms "UPDATE clusteredjob SET  VERSION = '1012145' ,  NEXT_RUN = '1541362451014'  WHERE JOB_ID='47d35268-25c4-4655-bb8d-f321aol876cf' AND VERSION='1012144'"
atlassian-jira-sql.log.1:2018-11-04 21:10:31,061 Caesium-1-2 ServiceRunner    0ms "SELECT DISTINCT ID FROM rundetails WHERE JOB_ID='47d35268-25c4-4655-bb8d-f321af5876cf'"
atlassian-jira-sql.log.1:2018-11-04 21:10:31,124 Caesium-1-2 ServiceRunner    63ms "INSERT INTO rundetails (ID, JOB_ID, START_TIME, RUN_DURATION, RUN_OUTCOME, INFO_MESSAGE) VALUES ('391187558', '47d35268-25c4-4655-bb8d-f321af5876cf', '2018-11-04 21:10:31.014', '47', 'S', '')"                                       
....
 $ grep "Caesium" atlassian-jira-sql.log* |grep Caesium-1-2 |grep -iv Connection|grep 24efa140-eb2d-456e-ab3a-56b6834821e5|head
atlassian-jira-sql.log:2018-11-04 21:22:27,672 Caesium-1-2 ServiceRunner    0ms "SELECT ID, JOB_ID, JOB_RUNNER_KEY, SCHED_TYPE, INTERVAL_MILLIS, FIRST_RUN, CRON_EXPRESSION, TIME_ZONE, NEXT_RUN, VERSION, PARAMETERS FROM clusteredjob WHERE JOB_ID='24efa140-eb2d-456e-ab3a-56b6834821e5'"
atlassian-jira-sql.log:2018-11-04 21:22:27,688 Caesium-1-2 ServiceRunner    16ms "UPDATE clusteredjob SET  VERSION = '538802' ,  NEXT_RUN = '1541363007521'  WHERE JOB_ID='24efa140-eb2d-456e-ab3a-56b6866821e5' AND VERSION='538801'"
atlassian-jira-sql.log:2018-11-04 21:22:27,688 Caesium-1-2 ServiceRunner    0ms "SELECT DISTINCT ID FROM rundetails WHERE JOB_ID='24rty140-eb2d-456e-ab3a-56b6834821e5'"
atlassian-jira-sql.log:2018-11-04 21:22:27,719 Caesium-1-2 ServiceRunner    31ms "INSERT INTO rundetails (ID, JOB_ID, START_TIME, RUN_DURATION, RUN_OUTCOME, INFO_MESSAGE) VALUES ('391195053', '24efa140-eb2d-456e-ab3a-56b6839821e5', '2018-11-04 21:22:27.672', '16', 'S', '')"

 Digging further into the logs you can see that there are hundreds of Jobs with strange Job ids that are being queried and updated by Caesium logged int the SQL logs:

$ egrep "UPDATE.*JOB_ID=" atlassian-jira-sql.log* |awk -F"JOB_ID=" '{print $2}'|cut -d" " -f1 |sort -u |wc -l
482
 $ egrep "UPDATE.*JOB_ID=" atlassian-jira-sql.log* |awk -F"JOB_ID=" '{print $2}'|cut -d" " -f1 |sort -u |head
'5f1c18a1-8055-4b2a-ac03-9ccfeadce715'
...
'02fd00d7-75d2-4f4b-bc38-acf9676fd1c9'

 $ egrep "UPDATE.*JOB_ID=" atlassian-jira-sql.log* |awk -F"JOB_ID=" '{print $2}'|cut -d" " -f1 |sort -u |tail
'febb5da5-63ce-4a4a-b7cf-b95a2f660471'
'ff3d1c63-03cc-4172-8830-9ac34869dd8b'
...
'fff81787-e893-4dd9-92c1-04473466ecec'
'PluginUpdateCheckJob-job'
 $

At this point its clear that there is a group of Jobs that are starving the other Jobs and Services from being executed as the Caesium threads are always busy with those job.

Also if the number of Jobs is big enough, it will cause a huge load on the Jira database, there will be thousands of selects and updates hitting the clusteredjob table.

Looking into the clusteredjob table: We need to check what kind of jobs are being executed by Caesium and where those are coming from, to do that, we need to query the table and see what we get:

select * from clusteredjob;
The table should contain records that look like this:
id    job_id                               job_runner_key                                                   sched_type interval_millis first_run     cron_expression time_zone next_run      version parameters 
 ----- ------------------------------------ ---------------------------------------------------------------- ---------- --------------- ------------- --------------- --------- ------------- ------- ---------- 
 10607 5f1c18a1-8055-4b2a-ac03-9ccfeadce715 com.atlassian.jira.plugin.ext.bamboo.service.PlanStatusUpdateJob I                    60000 1540807318676 NULL            NULL      1544091537860   43159 NULL      
 11709 6daa6545-f172-4536-88fc-1d755ac98f56 com.atlassian.jira.plugin.ext.bamboo.service.PlanStatusUpdateJob I                    60000 1543582964397 NULL            NULL      1544091537935    7169 NULL      
 11811 9078d8a2-dbab-4dd9-9caa-578762415eca com.atlassian.jira.plugin.ext.bamboo.service.PlanStatusUpdateJob I                    60000 1543845066251 NULL            NULL      1544091537992    2808 NULL      
 11112 c382c1b0-57cf-4b62-a5a0-5704b82aebbe com.atlassian.jira.plugin.ext.bamboo.service.PlanStatusUpdateJob I                    60000 1542106455820 NULL            NULL      1544091538007   27807 NULL      
 11606 c650b072-231a-4cca-b253-d07992037b78 com.atlassian.jira.plugin.ext.bamboo.service.PlanStatusUpdateJob I                    60000 1543572442972 NULL            NULL      1544091538012    7188 NULL      
 11911 c97ec842-5ebf-4819-8b83-8a11e2741b8e com.atlassian.jira.plugin.ext.bamboo.service.PlanStatusUpdateJob I                    60000 1543912925241 NULL            NULL      1544091538019    1679 NULL      
 12010 dd0f60da-d075-4f86-b8d6-225da617360f com.atlassian.jira.plugin.ext.bamboo.service.PlanStatusUpdateJob I                    60000 1543913901768 NULL            NULL      1544091538086    1664 NULL      
 10411 e23888dd-2ca6-48f0-8927-24991c2c0600 com.atlassian.jira.plugin.ext.bamboo.service.PlanStatusUpdateJob I                    60000 1540454936120 NULL            NULL      1544091538117   49030 NULL

Those are records corresponding to the Bamboo PlanStatusUpdateJob that is part of the Jira integration built in functionality with Bamboo.

This problem even happens if Jira is not integrated with Bamboo.

Cause

There is a known bug that could be the root cause of this problem. Please check the bug report: JRASERVER-66593 - Duplicated "com.atlassian.jira.plugin.ext.bamboo.service.PlanStatusUpdateJob" job created during each JIRA restart, it has been fixed with Jira 8.2.2

The causes duplicate jobs to pile up in the Jira clusteredjob table.

Every time Jira is restarted a new job is inserted into the table, the Jobs have unique Job_Ids and thus even putting a unique constraint on that field will not help prevent this issue.

Workaround

For the time being, the only way to avoid this problem is to delete the duplicate Jobs from the clusteredjob table before starting up Jira.

To do this follow the below steps:

Stop Jira.
Execute the below query to remove the duplicate jobs from the clusteredjob table:

Always back up your data before performing any modifications to the database. If possible, test any alter, insert, update, or delete SQL commands on a staging server first.
```
delete from clusteredjob where job_runner_key = 'com.atlassian.jira.plugin.ext.bamboo.service.PlanStatusUpdateJob';
```
Start Jira.
Jira then will insert another record once it is started and then there will be only 1 Job in the table.

Description	OutGoing MailQueueService among other Services are not running on Jira
Product	Jira, Jira Service Management

Jira Software Support

Get started

Knowledge base

Products

Jira Software

Jira Service Management

Jira Work Management

Confluence

Bitbucket

Resources

Documentation

Community

System Status

Suggestions and bugs

Marketplace

Billing and licensing

Out Going MailQueueService among other Services are not running on Jira

Still need help?

Problem

Diagnosis

Cause

Workaround

Page

Viewport

Confluence

Out Going MailQueueService among other Services are not running on Jira

Related content

Still need help?

Problem

Diagnosis

Cause

Workaround

Related content