Synchronization with Crowd directory fails intermittently - SAXParseException
Platform Notice: Server and Data Center Only - This article only applies to Atlassian products on the server and data center platforms.
The full synchronization between a Crowd directory and other Atlassian applications fails within the same time period, e.g. Jira synchronization always fails after 30 seconds.
Incremental synchronization works most of the time, but depending on the number of records being synced, it can fail intermittently.
- Atlassian Crowd (Server or Data Center)
- Any Atlassian application syncing to Crowd behind a Load Balancer
The error message may change according to the application trying to sync with Crowd.
For example, when Jira tries to sync with Crowd and it fails, a Parsing Exception, due to a premature end of the XML file, is thrown at the application logs :
2020-06-12 00:56:01,389 Caesium-1-3 INFO ServiceRunner [c.a.crowd.directory.DbCachingRemoteDirectory] failed synchronisation complete for directory [ 10000 ] in [ 48878ms ] 2020-06-12 00:56:01,461 Caesium-1-3 ERROR ServiceRunner [c.atlassian.scheduler.JobRunnerResponse] Unable to synchronise directory com.atlassian.crowd.exception.OperationFailedException: javax.xml.bind.UnmarshalException - with linked exception: [org.xml.sax.SAXParseException; lineNumber: 3443; columnNumber: 16; Premature end of file.]
The load balancer in front of the application that is trying to sync with Crowd has a gateway timeout set to a value lower than it needs to complete the full sync.
The issue is usually triggered by a full synchronization because it takes more time to sync all the records. The failure always happens after the same amount of time - a few seconds after the gateway timeout set at the load balancer.
To overcome this issue you can either:
- Increase the Gateway Timeout on the Load Balancer settings; or
- Bypass the load balancer when connecting the application to Crowd