Scheduled jobs may stall and fail to process if one job becomes stuck
Platform Notice: Server and Data Center Only. This article only applies to Atlassian products on the server and data center platforms.
Scheduled jobs that normally run frequently (such as the Flush Index Queue task) will fail to run. Thread dumps will reveal the majority of
scheduler_Worker threads are simply in a waiting state:
Thread[scheduler_Worker-10,5,main] java.lang.Object.wait(Native Method) org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:543)
One of the
scheduler_Worker threads will be active however - and in a stuck state. In your thread dumps, look for entries in the stack trace containing
job - this will help you identify threads that are stuck while running a job (thread dump shortened for brevity):
Thread[scheduler_Worker-5,5,main] ... com.sun.proxy.$Proxy105.jobWasExecuted(Unknown Source) ...
Another possibility is that there is a database error while running Quartz
2015-09-20 08:54:48,960 ERROR [scheduler_Worker-8] [org.quartz.core.ErrorLogger] schedulerError Unable to notify JobListener(s) of Job to be executed: (Job will NOT be executed!). trigger= DEFAULT.IndexQueueFlusher job= DEFAULT.IndexQueueFlusher org.quartz.SchedulerException: JobListener 'ScheduledJobListener' threw exception: Could not open Hibernate Session for transaction; nested exception is net.sf.hibernate.exception.JDBCConnectionException: Cannot open connection [See nested exception: org.springframework.transaction.CannotCreateTransactionException: Could not open Hibernate Session for transaction; nested exception is net.sf.hibernate.exception.JDBCConnectionException: Cannot open connection] at org.quartz.core.QuartzScheduler.notifyJobListenersToBeExecuted(QuartzScheduler.java:1951) at org.quartz.core.JobRunShell.notifyListenersBeginning(JobRunShell.java:364) at org.quartz.core.JobRunShell.run(JobRunShell.java:190) at com.atlassian.confluence.schedule.quartz.ConfluenceQuartzThreadPool.lambda$runInThread$46(ConfluenceQuartzThreadPool.java:19) at com.atlassian.confluence.schedule.quartz.ConfluenceQuartzThreadPool$$Lambda$94/1342121544.run(Unknown Source) at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:549)
Confluence uses Quartz, a third party library for scheduling tasks.
- Individual jobs are submitted to one of 10 worker threads.
- When a job starts running, it takes a lock on an object, that all the other threads will be looking for.
- Only one thread can be running at any given time, they'll be waiting until it's free
- Once the job completes, the running thread will release the lock
- The remaining threads will check if they are able to start every 500ms
If a problem occurs during the job execution that does not necessarily cause an error (for example, a database timeout) then the running thread will stall and never release it's lock; and other threads will not be able to start their jobs.
The Scheduled Jobs Screen
The Scheduled Jobs screen will not accurate reflect the "stuck" status of the job (due to the way jobs and their run times are handled). We have raised - CONF-38691Getting issue details... STATUS to improve this behavior.
The underlying cause of the failure must be addressed. Thread dumps will provide an idea of the root cause (for example a stuck database query due to a connection timing out, or a corrupt document might cause this problem). You can contact Atlassian Support for assistance in determining the root cause.