'Incorrect string value' error thrown when restoring XML backup in Confluence
Platform notice: Server and Data Center only. This article only applies to Atlassian products on the server and data center platforms.
When attempting to restore an XML backup in Confluence, the process stops and an error is thrown.
The following appears in the
logExceptions Incorrect string value: '\xF0\x9F\x98\x80</...' for column 'BODY' at row 1
Caused by: java.sql.SQLException: Incorrect string value: '\xF0\x9F\x8D\xBA ...' for column 'BODY' at row 1 at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:998)
An invalid XML character (Unicode: 0xffff) was found in the CDATA section
The XML backup contains an invalid character, for example, a 4-byte Unicode character like above, that is not compatible with the database being used. The reason for this invalid characters is the bug in MySQL. However, there is an improvement request for Confluence to handle 4byte UTF-8 Characters gracefully:
There's also a Bug raised for Confluence in regards to the 0xFFFF error:
Remove the invalid characters from the database:
- Download atlassian-xml-cleaner-0.1.jar
- Open a command prompt and locate the XML or ZIP backup file on your computer, ensuring that it is extracted if it's within a ZIP file. In this example, we will use
Run the cleaner as shown:
$ java -jar atlassian-xml-cleaner-0.1.jar entities.xml > entities-clean.xml
Sometimes the invalid characters can also exist in the Plugin Data which located in the
activeObjectsBackupRestoreProvider.pdatafile that needs to be cleaned using the cleaner as well (file is included the zip file however sometimes not visible before extracting the zip file).
This will create a copy of
entities-clean.xmlwith the invalid characters removed.
- Copy the
entities-clean.xmlfile into another directory, rename it back to
entities.xmland create a new ZIP with the entities
If you are on a Linux server, the commands below will do the trick:
# this will recreate entities.xml in the current directory: unzip <path>/Confluence-backup.zip entities.xml # fix the entities file, saving its output to a new file (entities-clean.xml): java -jar atlassian-xml-cleaner-0.1.jar entities.xml > entities-clean.xml # rename the original entities file mv entities.xml entities-original.xml # rename the fixed entities file to the expected name mv entities-clean.xml entities.xml # update the zip file with the new entities.xml file zip -u <path>/Confluence-backup.zip entities.xml
- Extract only a specific file from a zipped archive to a given directory
- How to update one file in a zip archive
If you are seeing an error specifically with 0xffff as the affected character, please use this perl command to fix the file:
perl -i -pe 's/\xef\xbf\xbf//g' entities.xml
And if experiencing the error with 0xfffe, use the below perl command:
perl -i -pe 's/\xef\xbf\xbe//g' entities.xml
And in case you are running Windows and the above Perl command doesn't work, here's a Power Shell script to fix the problem:
$yourfile = "PATH_TO_THE_XML\entities.xml" $outputfile = "PATH_TO_SAVE_NEW_XML\entities_clean.xml" get-content -path $yourfile | out-file $outputfile -encoding utf8