'Incorrect string value' error thrown when restoring XML backup in Confluence

Still need help?

The Atlassian Community is here for you.

Ask the community

Platform notice: Server and Data Center only. This article only applies to Atlassian products on the Server and Data Center platforms.

Support for Server* products ended on February 15th 2024. If you are running a Server product, you can visit the Atlassian Server end of support announcement to review your migration options.

*Except Fisheye and Crucible

Problem

When attempting to restore an XML backup in Confluence, the process stops and an error is thrown.

The following appears in the atlassian-confluence.log

logExceptions Incorrect string value: '\xF0\x9F\x98\x80</...' for column 'BODY' at row 1

Or:

Caused by: java.sql.SQLException: Incorrect string value: '\xF0\x9F\x8D\xBA  ...' for column 'BODY' at row 1
	at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:998)

Or:

An invalid XML character (Unicode: 0xffff) was found in the CDATA section

Cause

The XML backup contains an invalid character, for example, a 4-byte Unicode character like above, that is not compatible with the database being used. The reason for this invalid characters is the bug in MySQL. However, there is an improvement request for Confluence to handle 4byte UTF-8 Characters gracefully:

There's also a Bug raised for Confluence in regards to the 0xFFFF error:

Workaround

Remove the invalid characters from the database:

  1. Download atlassian-xml-cleaner-0.1.jar
  2. Open a command prompt and locate the XML or ZIP backup file on your computer, ensuring that it is extracted if it's within a ZIP file. In this example, we will use entities.xml.
  3. Run the cleaner as shown:

    $ java -jar atlassian-xml-cleaner-0.1.jar entities.xml > entities-clean.xml

    (info) Sometimes the invalid characters can also exist in the Plugin Data which located in the activeObjectsBackupRestoreProvider.pdata file that needs to be cleaned using the cleaner as well (file is included the zip file however sometimes not visible before extracting the zip file). 

  4. This will create a copy of entities.xml as entities-clean.xml with the invalid characters removed. 

  5. Copy the entities-clean.xml file into another directory, rename it back to entities.xml and create a new ZIP with the entities.xml file.

 Note: The zip file should contain the Attachments folder as well. If this folder is not included the attachments will show up as broken links or broken images.  



Import the new ZIP file

If you are on a Linux server, the commands below will do the trick:

# this will recreate entities.xml in the current directory:
unzip <path>/Confluence-backup.zip entities.xml

# fix the entities file, saving its output to a new file (entities-clean.xml):
java -jar atlassian-xml-cleaner-0.1.jar entities.xml > entities-clean.xml

# rename the original entities file
mv entities.xml entities-original.xml

# rename the fixed entities file to the expected name
mv entities-clean.xml entities.xml

# update the zip file with the new entities.xml file
zip -u <path>/Confluence-backup.zip entities.xml

(info) For reference: 

If you are seeing an error specifically with 0xffff as the affected character, please use this perl command to fix the file:

perl -i -pe 's/\xef\xbf\xbf//g' entities.xml

And if experiencing the error with 0xfffe, use the below perl command:

perl -i -pe 's/\xef\xbf\xbe//g' entities.xml

And in case you are running Windows and the above Perl command doesn't work, here's a Power Shell script to fix the problem:

$yourfile = "PATH_TO_THE_XML\entities.xml"
$outputfile = "PATH_TO_SAVE_NEW_XML\entities_clean.xml"
get-content -path $yourfile | out-file $outputfile -encoding utf8

DescriptionWhen attempting to restore an XML backup in Confluence, the process stops and an error is thrown
ProductConfluence
PlatformServer
Last modified on Nov 21, 2023

Was this helpful?

Yes
No
Provide feedback about this article
Powered by Confluence and Scroll Viewport.