JIRA Services stop working due to a database network failure

Still need help?

The Atlassian Community is here for you.

Ask the community


Platform notice: Server and Data Center only. This article only applies to Atlassian products on the Server and Data Center platforms.

Support for Server* products ended on February 15th 2024. If you are running a Server product, you can visit the Atlassian Server end of support announcement to review your migration options.

*Except Fisheye and Crucible

Problem

All JIRA services e.g. Mail Queue, Mail handler, Directory sync services etc. stop working shortly after a connectivity problem with the database has occurred, even after the database connection has been restored and JIRA appears operational otherwise.

Thread dumps will show long running stacktraces such as the following, showing Caesium threads stuck on database connections:


"Caesium-1-2" #130 daemon prio=5 os_prio=0 tid=0x0000000003536800 nid=0xc54 runnable [0x00007f90c2b8e000]
   java.lang.Thread.State: RUNNABLE
	at java.net.SocketInputStream.socketRead0(Native Method)
	at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
	at java.net.SocketInputStream.read(SocketInputStream.java:171)
	at java.net.SocketInputStream.read(SocketInputStream.java:141)
	at java.io.DataInputStream.readFully(DataInputStream.java:195)
	at java.io.DataInputStream.readFully(DataInputStream.java:169)
	at net.sourceforge.jtds.jdbc.SharedSocket.readPacket(SharedSocket.java:850)
	at net.sourceforge.jtds.jdbc.SharedSocket.getNetPacket(SharedSocket.java:731)
	- locked <0x0000000720eb03a0> (a java.util.concurrent.ConcurrentHashMap)
	at net.sourceforge.jtds.jdbc.ResponseStream.getPacket(ResponseStream.java:477)
	at net.sourceforge.jtds.jdbc.ResponseStream.read(ResponseStream.java:114)
	at net.sourceforge.jtds.jdbc.ResponseStream.peek(ResponseStream.java:99)
	at net.sourceforge.jtds.jdbc.TdsCore.wait(TdsCore.java:4127)
	at net.sourceforge.jtds.jdbc.TdsCore.executeSQL(TdsCore.java:1086)
	- locked <0x0000000720eb2a00> (a net.sourceforge.jtds.jdbc.TdsCore)
	at net.sourceforge.jtds.jdbc.TdsCore.microsoftPrepare(TdsCore.java:1219)
	at net.sourceforge.jtds.jdbc.JtdsConnection.prepareSQL(JtdsConnection.java:708)
	- locked <0x0000000720eaff38> (a net.sourceforge.jtds.jdbc.JtdsConnection)
	at net.sourceforge.jtds.jdbc.JtdsPreparedStatement.executeQuery(JtdsPreparedStatement.java:1028)
	- locked <0x0000000720eaff38> (a net.sourceforge.jtds.jdbc.JtdsConnection)
	at org.apache.commons.dbcp2.DelegatingPreparedStatement.executeQuery(DelegatingPreparedStatement.java:83)
	at org.apache.commons.dbcp2.DelegatingPreparedStatement.executeQuery(DelegatingPreparedStatement.java:83)
	at org.ofbiz.core.entity.jdbc.SQLProcessor.executeQuery(SQLProcessor.java:633)
	at org.ofbiz.core.entity.GenericDAO.createEntityListIterator(GenericDAO.java:967)
	at org.ofbiz.core.entity.GenericDAO.selectListIteratorByCondition(GenericDAO.java:883)
	at org.ofbiz.core.entity.GenericHelperDAO.findListIteratorByCondition(GenericHelperDAO.java:194)
	at org.ofbiz.core.entity.GenericDelegator.findListIteratorByCondition(GenericDelegator.java:1237)
	at com.atlassian.jira.ofbiz.DefaultOfBizDelegator.findListIteratorByCondition(DefaultOfBizDelegator.java:398)
	at com.atlassian.jira.ofbiz.WrappingOfBizDelegator.findListIteratorByCondition(WrappingOfBizDelegator.java:278)
	at com.atlassian.jira.entity.SelectQueryImpl$ExecutionContextImpl.forEach(SelectQueryImpl.java:227)
	at com.atlassian.jira.entity.SelectQueryImpl$ExecutionContextImpl.consumeWith(SelectQueryImpl.java:214)
	at com.atlassian.jira.entity.SelectQueryImpl$ExecutionContextImpl.singleValue(SelectQueryImpl.java:191)
	at com.atlassian.jira.scheduler.OfBizClusteredJobDao.find(OfBizClusteredJobDao.java:88)
	at com.atlassian.scheduler.caesium.impl.CaesiumSchedulerService.executeClusteredJob(CaesiumSchedulerService.java:417)
	at com.atlassian.scheduler.caesium.impl.CaesiumSchedulerService.executeClusteredJobWithRecoveryGuard(CaesiumSchedulerService.java:462)
	at com.atlassian.scheduler.caesium.impl.CaesiumSchedulerService.executeQueuedJob(CaesiumSchedulerService.java:390)
	at com.atlassian.scheduler.caesium.impl.CaesiumSchedulerService$1.consume(CaesiumSchedulerService.java:285)
	at com.atlassian.scheduler.caesium.impl.CaesiumSchedulerService$1.consume(CaesiumSchedulerService.java:282)
	at com.atlassian.scheduler.caesium.impl.SchedulerQueueWorker.executeJob(SchedulerQueueWorker.java:65)
	at com.atlassian.scheduler.caesium.impl.SchedulerQueueWorker.executeNextJob(SchedulerQueueWorker.java:59)
	at com.atlassian.scheduler.caesium.impl.SchedulerQueueWorker.run(SchedulerQueueWorker.java:34)
	at java.lang.Thread.run(Thread.java:745)

Diagnosis

Environment

    • Observed with JTDS JDBC connection to SQL server database (could potentially occur in other envrionments)

Cause

JIRA services related database processes are unable to recover from the connection loss due to a network issue that occurred.

Workaround

Restart JIRA to refresh the services after a database network error has occurred.

Resolution for MS SQL

  • If JIRA core version is older than 7.2.0, refer to  JRASERVER-62072 - Getting issue details... STATUS  and upgrade to first get a fix for a known issue with similar symptoms.
  • If you're seeing this in JIRA 7.2.0 and later, apply a socket timeout parameter to the database connection configuration. This will force this long running processes to hit into a socketTimeout instead of being stuck indefinitely, and will allow the services to regain operation with a new functional database connection.

1. Edit dbconfig.xml and find the database url tag e.g. socketTimeout:

<url>jdbc:jtds:sqlserver://SQL:1433/jira</url>

2. Append the socketTimeout= setting to the database URL as shown below. For instance, the setting below sets the stuck database processes to hit into a timeout after 10 minutes which destroys the dead connection, in order for the Service threads to use a new connection to the database and regain operation:

<url>jdbc:jtds:sqlserver://SQL:1433/jira;SocketTimeout=600000</url>


3. Restart JIRA on each node for the changes to take effect.

Resolution for PostgreSQL

From: Connection problems to PostgreSQL result in stuck threads in Jira

To solve this problem:

  1. Upgrade the JDBC driver for PostgreSQL to 42.2.18 or later. This driver better handles the properties you’ll add in the next step.

  2. Edit the dbconfig.xml file, and add the following properties into <jdbc-resource>:

    <connection-properties>tcpKeepAlive=true;socketTimeout=240</connection-properties>


    • tcpKeepAlive: checks whether the connection is still running.

    • socketTimeout: terminates the connection after the specified time (in seconds). We’ve chosen a conservative 6 minutes, but if you tend to run SQL queries that take a long time, you can increase this value.

  3. Restart JIRA on each node for the changes to take effect.

PostgreSQL jdbc version

The socketTimeout connection property was not enforced properly due to a bug in the driver. Version 42.2.15 (2020-08-14) include a bug fix:

"Make sure socketTimeout is enforced PR 1831,  210b27a6" from in the PostgreSQL JDBC Driver change log.

Please ensure that version 42.2.15 or later is used to ensure that the socketTimeout connection property works as expected. 


Resolution for Oracle

To solve this problem:

1. Add the following system property to the setenv.sh file on each node in a test environment. See Setting properties and options on startup for more information. 

-Doracle.jdbc.ReadTimeout=60000

2. Restart JIRA on each node for the changes to take effect.

Jira will recover after 1 minute, which is in accordance with the ReadTimeout configured as 60000 (timeout is in milliseconds).

Last modified on May 5, 2021

Was this helpful?

Yes
No
Provide feedback about this article
Powered by Confluence and Scroll Viewport.