Removing invalid characters from XML backups

JIRA 3.1 and above should not suffer from this problem unless migrating to postgreSQL from another database such as MySQL. Invalid characters otherwise are automatically stripped from imported data.

In older versions of JIRA it was possible to cut & paste text containing control characters into JIRA issue fields. This causes problems, because JIRA's backup format is XML, and XML does not allow for the storage of most control characters.When XML containing control characters is imported into JIRA, the import fails with an error:

To fix this, the control characters will need to be removed from the JIRA backup file. This can be done with the following:

  1. Download atlassian-xml-cleaner-0.1.jar
  2. Open a command prompt and locate the XML or ZIP backup file on your computer, ensuring that it is extracted if it's within a ZIP file. In this example, we will use entities.xml.
  3. Run the application with the below:

    $ java -jar atlassian-xml-cleaner-0.1.jar entities.xml > entities-clean.xml

    This will create a copy of entities.xml as entities-clean.xml with the invalid characters removed. 

  4. Copy the entities-clean.xml file into another directory, rename it back to entities.xml and create a new ZIP with the newly created entities.xml file and the activeobjects.xml file.
  5. Import the new ZIP file, ensuring that it contains both XML files.

If you are seeing an error specifically with 0xffff as the affected character, please use this perl command to fix the file:

And if experiencing the error with 0xfffe, use the below perl command:

Was this helpful?

Thanks for your feedback!

11 Archived comments

  1. User avatar

    Jeff Turner

    This XML cleaner doesn't strip <ffff> characters. If, after running this XML cleaner you may still get the error:

    An invalid XML character (Unicode: 0xffff) was found in the CDATA section.

    On my system I was able to fix this with:

    perl -i -pe 's/\xef\xbf\xbf//g' entities.xml

    Note perl doesn't allow \x{FFFF} in regexps so you have to break it into bytes as above.

    20 Dec 2011
    1. User avatar

      Ryan Brown

      You may also run into the following error:

      An invalid XML character (Unicode: 0xfffe) was found in the CDATA section.

      Altering the above script to "perl -i -pe 's/\xef\xbf\xbe//g' entities-clean.xml" and running it again will remove these characters.

      16 Jul 2015
  2. User avatar

    Laszlo Kremer

    I also faced that XML cleaner leaves invalid characters in the XML backup. Since it is not developed any further, it won't be fixed. Another method for the cleaning:

    • create the export.zip
    • create a test JIRA environment
    • import the export.zip into the test JIRA
    • log into the test JIRA, create an export.zip again, it will be clean, now you can run the anonymizer

    18 Mar 2013
  3. User avatar

    eMundo GmbH Support Team

    You can Use Notepad++ to clean them out (if file is not too big) using \uffff Search and Replace

    04 Oct 2013
  4. User avatar

    Michael March

    "JIRA 3.1 and above should not suffer from this problem. Invalid characters are automatically stripped from imported data"


    That's not necessarily true. We're coming from a MySQL backed instance that's moving to Postgresql and we had to pipe the XML through this filter.

    23 Jun 2014
    1. User avatar

      Jason Smith

      Same going from Oracle to Postgresql.

      10 Mar 2015
      1. User avatar

        Jason Smith

        And Oracle to MySQL. :/

        11 Jun 2015
  5. User avatar

    Adam

    The cleaner does not work for us, also removed with Notepad++ and we still have the same error. Opened a ticket with support, will report back once its resolved.

    13 May 2015
    1. User avatar

      Ben Paul

      Hi Adam, did you get a resolution on this? I've also had no success stripping the invalid character with either perl or notepad++. I'd appreciate any update on what you've done to resolve this.

      28 May 2015
      1. User avatar

        Adam

        Support ended up sorting this out for us, they took a copy of our DB and removed all of the invalid characters.

        29 May 2015
        1. User avatar

          William Crighton [CCC]

          jesus. I am so NOT looking forward to when you can no longer comment on these pages. I swear (yea, too much) that I've found more fixes here than anywhere else ( for the edgy things encountered by the hallowed few )

          26 Jul 2015
Powered by Confluence and Scroll Viewport