Unlimited LDAP read timeout can cause Cluster Locks health check to fail if there are communication issues
Platform Notice: Server and Data Center Only - This article only applies to Atlassian products on the server and data center platforms.
In some situations where you have a high LDAP read timeout and JIRA cannot properly communicate with your LDAP server, the Cluster Lock health check will fail. By default, this check will fail if there is a process holding a cluster lock for more than 300 seconds, which will happen if JIRA can connect to your LDAP server but is having an issue reading information.
Node 'node1' has been holding cluster lock, 'com.atlassian.crowd.embedded.api.Directory:10100', for 503 seconds.
- A JIRA instance that is configured with LDAP/AD and has a read timeout of 0 (infinite) or higher than 300 seconds.
This will only happen if there is an issue reading from the LDAP server. The atlassian-jira.log file shows that a sync starts but does not complete as expected.
2017-01-31 11:34:08,377 atlassian-scheduler-quartz1.clustered_Worker-1 INFO ServiceRunner [atlassian.crowd.directory.DbCachingRemoteDirectory] INCREMENTAL synchronisation for directory [ 10400 ] starting 2017-01-31 11:34:08,377 atlassian-scheduler-quartz1.clustered_Worker-1 INFO ServiceRunner [atlassian.crowd.directory.DbCachingRemoteDirectory] Attempting INCREMENTAL synchronisation for directory [ 10400 ]
If you review thread dumps, you will see a long-running thread for com.sun.jndi.ldap.LdapRequest across all thread dumps. This suggests that JIRA is waiting for a response from the LDAP server and seemingly "stuck" in this stage.
"atlassian-scheduler-quartz1.clustered_Worker-1" #140 prio=5 tid=0x00007f6659c33000 nid=0x4e98 in Object.wait() [0x00007f6575984000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) at java.lang.Object.wait(Object.java:502) at com.sun.jndi.ldap.Connection.readReply(Connection.java:467) - locked <0x00000005e0912700> (a com.sun.jndi.ldap.LdapRequest) at com.sun.jndi.ldap.LdapClient.getSearchReply(LdapClient.java:640) at com.sun.jndi.ldap.LdapClient.search(LdapClient.java:563)
The following workaround will help as a short term solution:
- Manually start the LDAP directory sync again
- Change the LDAP read and connection timeouts to be finite (e.g. not 0 ) so that the process can be terminated with a read timeout exception if there are any communication issues.
This underlying communication issue will likely need to be addressed on the network side to see why JIRA is having communication issues with LDAP/AD. These methods are out of the scope of this guide.