Scheduled jobs may stall and fail to process if one job becomes stuck
Platform notice: Server and Data Center only. This article only applies to Atlassian products on the Server and Data Center platforms.
Support for Server* products ended on February 15th 2024. If you are running a Server product, you can visit the Atlassian Server end of support announcement to review your migration options.
*Except Fisheye and Crucible
Problem
Scheduled jobs that normally run frequently (such as the Flush Index Queue task) will fail to run. Thread dumps will reveal the majority of scheduler_Worker
threads are simply in a waiting state:
Thread[scheduler_Worker-10,5,main]
java.lang.Object.wait(Native Method)
org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:543)
One of the scheduler_Worker
threads will be active however - and in a stuck state. In your thread dumps, look for entries in the stack trace containing job
- this will help you identify threads that are stuck while running a job (thread dump shortened for brevity):
Thread[scheduler_Worker-5,5,main]
...
com.sun.proxy.$Proxy105.jobWasExecuted(Unknown Source)
...
Another possibility is that there is a database error while running Quartz
2015-09-20 08:54:48,960 ERROR [scheduler_Worker-8] [org.quartz.core.ErrorLogger] schedulerError Unable to notify JobListener(s) of Job to be executed: (Job will NOT be executed!). trigger= DEFAULT.IndexQueueFlusher job= DEFAULT.IndexQueueFlusher
org.quartz.SchedulerException: JobListener 'ScheduledJobListener' threw exception: Could not open Hibernate Session for transaction; nested exception is net.sf.hibernate.exception.JDBCConnectionException: Cannot open connection [See nested exception: org.springframework.transaction.CannotCreateTransactionException: Could not open Hibernate Session for transaction; nested exception is net.sf.hibernate.exception.JDBCConnectionException: Cannot open connection]
at org.quartz.core.QuartzScheduler.notifyJobListenersToBeExecuted(QuartzScheduler.java:1951)
at org.quartz.core.JobRunShell.notifyListenersBeginning(JobRunShell.java:364)
at org.quartz.core.JobRunShell.run(JobRunShell.java:190)
at com.atlassian.confluence.schedule.quartz.ConfluenceQuartzThreadPool.lambda$runInThread$46(ConfluenceQuartzThreadPool.java:19)
at com.atlassian.confluence.schedule.quartz.ConfluenceQuartzThreadPool$$Lambda$94/1342121544.run(Unknown Source)
at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:549)
Cause
Confluence uses Quartz, a third party library for scheduling tasks.
- Individual jobs are submitted to one of 10 worker threads.
- When a job starts running, it takes a lock on an object, that all the other threads will be looking for.
- Only one thread can be running at any given time, they'll be waiting until it's free
- Once the job completes, the running thread will release the lock
- The remaining threads will check if they are able to start every 500ms
If a problem occurs during the job execution that does not necessarily cause an error (for example, a database timeout) then the running thread will stall and never release it's lock; and other threads will not be able to start their jobs.
See - CONF-40064Getting issue details... STATUS for more information regarding indexing jobs that was stopped caused by intermittent connection issue to the database
The Scheduled Jobs Screen
The Scheduled Jobs screen will not accurate reflect the "stuck" status of the job (due to the way jobs and their run times are handled). We have raised - CONF-38691Getting issue details... STATUS to improve this behavior.
Resolution
The underlying cause of the failure must be addressed. Thread dumps will provide an idea of the root cause (for example a stuck database query due to a connection timing out, or a corrupt document might cause this problem). You can contact Atlassian Support for assistance in determining the root cause.
There is also a bug report for improving the behavior of the scheduled jobs screen. See - CONF-38691Getting issue details... STATUS for more information