Importing XML backup or anonymizing data fails due to invalid characters in attribute values in Jira
Platform notice: Server and Data Center only. This article only applies to Atlassian products on the Server and Data Center platforms.
Support for Server* products ended on February 15th 2024. If you are running a Server product, you can visit the Atlassian Server end of support announcement to review your migration options.
*Except Fisheye and Crucible
Applies to Jira 6.4.x and above, and all versions of Jira Service Management.
Jira 3.1 and above should not suffer from this problem unless you are migrating to PostgreSQL from another database such as MySQL. Otherwise, invalid characters are automatically stripped from the imported data.
Summary
When importing an XML database backup file containing control characters, the Setup Wizard fails to import the backup and throws an error. For example:
Failed to import data: Error in action: com.atlassian.jira.action.admin.DataImport@1179clc.
result: error Exception occurred: org.xml.sax.SAXParseException: An invalid XML character (Unicode: 0xc)
was found in the value of attribute "description".
Environment
Applies to Jira 6.4.x and above, and all versions of Jira Service Management.
Diagnosis
When Restoring data from an xml backup or Anonymising Jira application data, the tasks may fail reporting "An invalid XML character (Unicode: ...)".
The reported character code and where it was found may vary:
- was found in the CDATA section
- was found in the comment.
- etc
Cause
In older versions of Jira, it was possible to cut and paste text containing control characters into Jira issue fields. This causes problems because the backup format is XML, which does not support most control characters.
This problem can also be caused by the following bug:
- JRACLOUD-65145Getting issue details... STATUS
Solution
Remove the control characters from the Jira backup file with Atlassian’s XML cleaner utility:
- Extract the ZIP archive containing the
entities.xml
andactiveobjects.xml
database backup files. - Download atlassian-xml-cleaner-0.1.jar to the same location as the extracted backup file.
- Open a command prompt and navigate to the location of the backup file.
Run the XML cleaner utility as follows:
java -jar atlassian-xml-cleaner-0.1.jar entities.xml > entities-clean.xml
This will create a copy of
entities.xml
asentities-clean.xml
with the invalid characters replaced by the � replacement character.- Copy
entities-clean.xml
into another directory and rename it back toentities.xml
. - Copy the previously extracted
activeobjects.xml
file into the same directory. - If the error occurred because of the
0xffff
or0xfffe
control characters, fixentities.xml
by running one of the following Perl commands:To fix errors related to the
0xffff
character, run:perl -i -pe 's/\xef\xbf\xbf//g' entities.xml
To fix errors related to the
0xfffe
character, run:perl -i -pe 's/\xef\xbf\xbe//g' entities.xml
- Create a new ZIP archive containing the new entities.xml file and the
activeobjects.xml
file. - Make sure that the new ZIP archive does not contain any subdirectories and that the files inside are named exactly
entities.xml
andactiveobjects.xml
. - Import the new ZIP file.
If the import fails because Jira is unable to find the entities.xml file inside the new archive, see Unable To Find JIRA Backup (entities.xml) Inside Of Zip File Error.