Unlimited LDAP read timeout can cause Cluster Locks health check to fail if there are communication issues

Still need help?

The Atlassian Community is here for you.

Ask the community

Platform notice: Server and Data Center only. This article only applies to Atlassian products on the Server and Data Center platforms.

Support for Server* products ended on February 15th 2024. If you are running a Server product, you can visit the Atlassian Server end of support announcement to review your migration options.

*Except Fisheye and Crucible

Problem

In some situations where you have a high LDAP read timeout and JIRA cannot properly communicate with your LDAP server, the Cluster Lock health check will fail. By default, this check will fail if there is a process holding a cluster lock for more than 300 seconds, which will happen if JIRA can connect to your LDAP server but is having an issue reading information.

 

Node 'node1' has been holding cluster lock, 'com.atlassian.crowd.embedded.api.Directory:10100', for 503 seconds.

 

Diagnosis

Environment

  • A JIRA instance that is configured with LDAP/AD and has a read timeout of 0 (infinite) or higher than 300 seconds.

Diagnostic Steps

  • This will only happen if there is an issue reading from the LDAP server. The atlassian-jira.log file shows that a sync starts but does not complete as expected.

    2017-01-31 11:34:08,377 atlassian-scheduler-quartz1.clustered_Worker-1 INFO ServiceRunner     [atlassian.crowd.directory.DbCachingRemoteDirectory] INCREMENTAL synchronisation for directory [ 10400 ] starting
    2017-01-31 11:34:08,377 atlassian-scheduler-quartz1.clustered_Worker-1 INFO ServiceRunner     [atlassian.crowd.directory.DbCachingRemoteDirectory] Attempting INCREMENTAL synchronisation for directory [ 10400 ]
  • If you review thread dumps, you will see a long-running thread for com.sun.jndi.ldap.LdapRequest across all thread dumps. This suggests that JIRA is waiting for a response from the LDAP server and seemingly "stuck" in this stage.

    "atlassian-scheduler-quartz1.clustered_Worker-1" #140 prio=5 tid=0x00007f6659c33000 nid=0x4e98 in Object.wait() [0x00007f6575984000]
       java.lang.Thread.State: WAITING (on object monitor)
    	at java.lang.Object.wait(Native Method)
    	at java.lang.Object.wait(Object.java:502)
    	at com.sun.jndi.ldap.Connection.readReply(Connection.java:467)
    	- locked <0x00000005e0912700> (a com.sun.jndi.ldap.LdapRequest)
    	at com.sun.jndi.ldap.LdapClient.getSearchReply(LdapClient.java:640)
    	at com.sun.jndi.ldap.LdapClient.search(LdapClient.java:563)

Workaround

The following workaround will help as a short term solution: 

  • Manually start the LDAP directory sync again
  • Change the LDAP read and connection timeouts to be finite (e.g. not 0 ) so that the process can be terminated with a read timeout exception if there are any communication issues.

Resolution

This underlying communication issue will likely need to be addressed on the network side to see why JIRA is having communication issues with LDAP/AD. These methods are out of the scope of this guide.

 

Last modified on Nov 2, 2018

Was this helpful?

Yes
No
Provide feedback about this article
Powered by Confluence and Scroll Viewport.