Importing XML backup or anonymizing data fails due to invalid characters in attribute values in Jira

Still need help?

The Atlassian Community is here for you.

Ask the community

Platform notice: Server and Data Center only. This article only applies to Atlassian products on the Server and Data Center platforms.

Support for Server* products ended on February 15th 2024. If you are running a Server product, you can visit the Atlassian Server end of support announcement to review your migration options.

*Except Fisheye and Crucible

Applies to Jira 6.4.x and above, and all versions of Jira Service Management.

Jira 3.1 and above should not suffer from this problem unless you are migrating to PostgreSQL from another database such as MySQL. Otherwise, invalid characters are automatically stripped from the imported data.

Summary

When importing an XML database backup file containing control characters, the Setup Wizard fails to import the backup and throws an error. For example:

Failed to import data: Error in action: com.atlassian.jira.action.admin.DataImport@1179clc. 
result: error Exception occurred: org.xml.sax.SAXParseException: An invalid XML character (Unicode: 0xc)
was found in the value of attribute "description".


Environment

Applies to Jira 6.4.x and above, and all versions of Jira Service Management.


Diagnosis

When Restoring data from an xml backup or Anonymising Jira application data, the tasks may fail reporting "An invalid XML character (Unicode: ...)".

The reported character code and where it was found may vary:

  • was found in the CDATA section
  • was found in the comment.
  • etc


Cause

In older versions of Jira, it was possible to cut and paste text containing control characters into Jira issue fields. This causes problems because the backup format is XML, which does not support most control characters.

This problem can also be caused by the following bug:

JRACLOUD-65145 - Getting issue details... STATUS


Solution

Remove the control characters from the Jira backup file with Atlassian’s XML cleaner utility:

  1. Extract the ZIP archive containing the entities.xml and activeobjects.xml database backup files.
  2. Download atlassian-xml-cleaner-0.1.jar to the same location as the extracted backup file.
  3. Open a command prompt and navigate to the location of the backup file.
  4. Run the XML cleaner utility as follows:

    java -jar atlassian-xml-cleaner-0.1.jar entities.xml > entities-clean.xml

    This will create a copy of entities.xml as entities-clean.xml with the invalid characters replaced by the � replacement character.

  5. Copy entities-clean.xml into another directory and rename it back to entities.xml.
  6. Copy the previously extracted activeobjects.xml file into the same directory.
  7. If the error occurred because of the 0xffff or 0xfffe control characters, fix entities.xml by running one of the following Perl commands:
    • To fix errors related to the 0xffff character, run:

      perl -i -pe 's/\xef\xbf\xbf//g' entities.xml
    • To fix errors related to the 0xfffe character, run:

      perl -i -pe 's/\xef\xbf\xbe//g' entities.xml
  8. Create a new ZIP archive containing the new entities.xml file and the activeobjects.xml file.
  9. Make sure that the new ZIP archive does not contain any subdirectories and that the files inside are named exactly entities.xml and activeobjects.xml.
  10. Import the new ZIP file.
    If the import fails because Jira is unable to find the entities.xml file inside the new archive, see Unable To Find JIRA Backup (entities.xml) Inside Of Zip File Error.
Last modified on Dec 21, 2022

Was this helpful?

Yes
No
Provide feedback about this article
Powered by Confluence and Scroll Viewport.