Crowd freezes when multiple applications make API calls at the same time

Still need help?

The Atlassian Community is here for you.

Ask the community

Symptoms

  • Crowd server thread dumps show that more than one application is making calls to Crowd's API, and these calls are causing database congestion by not allowing efficient access to the LDAP-DB cache in Crowd. You can confirm this by identifying API calls that are in "WAIT for DB" state.
  • Crowd 2.1 or newer versions are being used

Cause

Depending on what API calls are made by the applications, the call result will be the return of all the data cached (i.e., findAllGroupRelationships). Therefore, for big LDAP instances being cached in the database, this situation can cause Crowd to "freeze" for some minutes.

Resolution

Crowd will always use all the resources provided to it. The memory, database connections, and CPU assigned to the server can be increased. However this is not a good solution since Crowd may reach the resource limit again.

Keeping this in mind, the correct approach would be to make sure that no more than one application is going to make heavy API requests to Crowd at the same time. Also, ensure that the LDAP directory pooling intervals are never going to be the same.

Since all the applications are using the Crowd Integration client, which uses the ehcache, we have the opportunity to set different cache timeouts using the application's crowd-ehcache.xml files.

1. The LDAP-DB cache pooling interval
Suggestion: have a difference of 7 minutes for each cache interval.
Example:
Directory-1: 60 minutes
Directory-2: 67 minutes
Directory-3: 74 minutes

2. The application cache intervals defined at <App>/WEB-INF/classes/crowd-ehcache.xml file
Suggestion: have a difference of 7 minutes between each application
Examples:
Application-1:

  • timeToIdleSeconds="3600"
  • timeToLiveSeconds="3600"

Application-2:

  • timeToIdleSeconds="4020"
  • timeToLiveSeconds="4020"

Application-3:

  • timeToIdleSeconds="4440"
  • timeToLiveSeconds="4440"

For each application, all the crowd-ehcache.xml file caches must have the exact same timeToIdleSeconds and timeToLiveSeconds.

This KB suggests seven minutes. However, you can use any time frame that will ensure that the caches have their least common multiple reached less times.

 

Modifying later version of Confluence/JIRA applications's cache intervals can be done via Confluence/JIRA Admin >> User Directories >> Crowd Server through Synchronisation Interval (minutes) field.

Last modified on Feb 26, 2016

Was this helpful?

Yes
No
Provide feedback about this article
Powered by Confluence and Scroll Viewport.