Scheduled jobs may stall and fail to process if one job becomes stuck

Still need help?

The Atlassian Community is here for you.

Ask the community

Platform notice: Server and Data Center only. This article only applies to Atlassian products on the Server and Data Center platforms.

Support for Server* products ended on February 15th 2024. If you are running a Server product, you can visit the Atlassian Server end of support announcement to review your migration options.

*Except Fisheye and Crucible

Problem

Scheduled jobs that normally run frequently (such as the Flush Index Queue task) will fail to run. Thread dumps will reveal the majority of scheduler_Worker threads are simply in a waiting state:

Thread[scheduler_Worker-10,5,main]
	java.lang.Object.wait(Native Method)
	org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:543)

One of the scheduler_Worker threads will be active however - and in a stuck state. In your thread dumps, look for entries in the stack trace containing job - this will help you identify threads that are stuck while running a job (thread dump shortened for brevity):

Thread[scheduler_Worker-5,5,main]
	...
	com.sun.proxy.$Proxy105.jobWasExecuted(Unknown Source)
	...

Another possibility is that there is a database error while running Quartz

2015-09-20 08:54:48,960 ERROR [scheduler_Worker-8] [org.quartz.core.ErrorLogger] schedulerError Unable to notify JobListener(s) of Job to be executed: (Job will NOT be executed!). trigger= DEFAULT.IndexQueueFlusher job= DEFAULT.IndexQueueFlusher
org.quartz.SchedulerException: JobListener 'ScheduledJobListener' threw exception: Could not open Hibernate Session for transaction; nested exception is net.sf.hibernate.exception.JDBCConnectionException: Cannot open connection [See nested exception: org.springframework.transaction.CannotCreateTransactionException: Could not open Hibernate Session for transaction; nested exception is net.sf.hibernate.exception.JDBCConnectionException: Cannot open connection]
	at org.quartz.core.QuartzScheduler.notifyJobListenersToBeExecuted(QuartzScheduler.java:1951)
	at org.quartz.core.JobRunShell.notifyListenersBeginning(JobRunShell.java:364)
	at org.quartz.core.JobRunShell.run(JobRunShell.java:190)
	at com.atlassian.confluence.schedule.quartz.ConfluenceQuartzThreadPool.lambda$runInThread$46(ConfluenceQuartzThreadPool.java:19)
	at com.atlassian.confluence.schedule.quartz.ConfluenceQuartzThreadPool$$Lambda$94/1342121544.run(Unknown Source)
	at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:549)

Cause

Confluence uses Quartz, a third party library for scheduling tasks.

  1. Individual jobs are submitted to one of 10 worker threads.
  2. When a job starts running, it takes a lock on an object, that all the other threads will be looking for.
    1. Only one thread can be running at any given time, they'll be waiting until it's free
  3. Once the job completes, the running thread will release the lock
    1. The remaining threads will check if they are able to start every 500ms

If a problem occurs during the job execution that does not necessarily cause an error (for example, a database timeout) then the running thread will stall and never release it's lock; and other threads will not be able to start their jobs.

See  CONF-40064 - Getting issue details... STATUS  for more information regarding indexing jobs that was stopped caused by intermittent connection issue to the database

The Scheduled Jobs Screen

 The Scheduled Jobs screen will not accurate reflect the "stuck" status of the job (due to the way jobs and their run times are handled). We have raised CONF-38691 - Getting issue details... STATUS to improve this behavior.

 Resolution

The underlying cause of the failure must be addressed. Thread dumps will provide an idea of the root cause (for example a stuck database query due to a connection timing out, or a corrupt document might cause this problem). You can contact Atlassian Support for assistance in determining the root cause.

There is also a bug report for improving the behavior of the scheduled jobs screen. See CONF-38691 - Getting issue details... STATUS for more information

DescriptionScheduled jobs that normally run frequently (such as the Flush Index Queue task) will fail to run. Thread dumps will reveal the majority of scheduler_worker threads are simply in a waiting state.
ProductConfluence
PlatformServer
Last modified on Sep 2, 2021

Was this helpful?

Yes
No
Provide feedback about this article
Powered by Confluence and Scroll Viewport.