An invalid XML character (Unicode: 0xb) was found in the element content of the document error while synching Jira with Crowd

On this page

Still need help?

The Atlassian Community is here for you.

Ask the community

 

Summary

While performing synchronization between LDAP, Jira, and Crowd after some time it fails and will not allow any update to be completed.

Environment

Jira Service Management Data Center or Server.

Diagnosis

The main pattern of this issue is the error on logs, which points to special characters in the XML generate during the AD synchronization.

2023-07-07 10:48:14,186-0400 Caesium-1-4 ERROR ServiceRunner     [c.a.crowd.directory.DbCachingDirectoryPoller] Error occurred while refreshing the cache for directory [ 10000 ].
com.atlassian.crowd.exception.OperationFailedException: javax.xml.bind.UnmarshalException
 - with linked exception:
[org.xml.sax.SAXParseException; lineNumber: 391; columnNumber: 11740; An invalid XML character (Unicode: 0xb) was found in the element content of the document.]
	at com.atlassian.crowd.integration.rest.service.RestExecutor$MethodExecutor.andReceive(RestExecutor.java:381)
	at com.atlassian.crowd.integration.rest.service.RestCrowdClient.searchGroups(RestCrowdClient.java:556)

Cause

For this example above, and based on the stack trace (searchGroups) there is a vt line tabulation special character, that could either be in the "Group name" or "Group description"

Solution

  • First, we need to identify the source of the error, and to accomplish that we need to look at the stack track, it will tell if it is failing while:
    • Searching for groups or searching for users
      • Taking this above example we can see in the error stack trace that it failed while searching for groups: searchGroups(RestCrowdClient.java:556)
      • Based on this important information, we can check the entries we have on the Crowd side (database), and by executing the query below we can raise all the groups:

        select * from cwd_group;
      • Now with this information on our hands and having the data exported in a CSV file, we need to better understand which special character we are dealing with, in this example, we have a "Vertical line break 0xb", however in order to parse our CSV file we need to convert the 0xb between their decimal, hexadecimal, and octal bases so a 'grep' (in linux/unix) or a 'notepad++' could understand it, and to convert it you can use this tool here.

      • LINUX/UNIX approach
        • Based on this example and after converting the Unicode in their decimal, hexadecimal, and octal bases, we have the following: \x{B} now we can perform a grep against our CSV file to find which line (group or groups) will return the vertical line break:

          grep -n '\x{B}' my_csvfile.csv
        • The grep with "-n" will return the line number and the entry having the Unicode character.
      • WINDOWS approach
        • As we did for Linux/Unix, we need to convert in the same way to work this on Windows too, the only difference here is that for Windows we'll need other tools to allow us to work with the CSV file, the easier way is by installing the "Notepad++" this is a free text editor and will work like the grep above here are the steps:
          1. Install the notepad++ and open the CSV file
          2. Ctrl-F ( View -> Find )
          3. put \x{B} in search box
          4. Select search mode as 'Regular expression'
          5. Done, it will return the lines and the entries having the Unicode character.
    • Now that we know the groups (for example) having the Unicode characters we can edit the group/groups by removing it and then retry the synchronization it should finish without errors.


Last modified on Jul 11, 2023

Was this helpful?

Yes
No
Provide feedback about this article
Powered by Confluence and Scroll Viewport.