'Incorrect string value' error thrown when restoring XML backup in Confluence
Platform notice: Server and Data Center only. This article only applies to Atlassian products on the Server and Data Center platforms.
Support for Server* products ended on February 15th 2024. If you are running a Server product, you can visit the Atlassian Server end of support announcement to review your migration options.
*Except Fisheye and Crucible
Problem
When attempting to restore an XML backup in Confluence, the process stops and an error is thrown.
The following appears in the atlassian-confluence.log
logExceptions Incorrect string value: '\xF0\x9F\x98\x80</...' for column 'BODY' at row 1
Or:
Caused by: java.sql.SQLException: Incorrect string value: '\xF0\x9F\x8D\xBA ...' for column 'BODY' at row 1
at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:998)
Or:
An invalid XML character (Unicode: 0xffff) was found in the CDATA section
Cause
The XML backup contains an invalid character, for example, a 4-byte Unicode character like above, that is not compatible with the database being used. The reason for this invalid characters is the bug in MySQL. However, there is an improvement request for Confluence to handle 4byte UTF-8 Characters gracefully:
There's also a Bug raised for Confluence in regards to the 0xFFFF error:
Workaround
Remove the invalid characters from the database:
- Download atlassian-xml-cleaner-0.1.jar
- Open a command prompt and locate the XML or ZIP backup file on your computer, ensuring that it is extracted if it's within a ZIP file. In this example, we will use
entities.xml
. Run the cleaner as shown:
$ java -jar atlassian-xml-cleaner-0.1.jar entities.xml > entities-clean.xml
Sometimes the invalid characters can also exist in the Plugin Data which located in the
activeObjectsBackupRestoreProvider.pdata
file that needs to be cleaned using the cleaner as well (file is included the zip file however sometimes not visible before extracting the zip file).This will create a copy of
entities.xml
asentities-clean.xml
with the invalid characters removed.- Copy the
entities-clean.xml
file into another directory, rename it back toentities.xml
and create a new ZIP with the entities.xml
file.
Note: The zip file should contain the Attachments folder as well. If this folder is not included the attachments will show up as broken links or broken images.
If you are on a Linux server, the commands below will do the trick:
# this will recreate entities.xml in the current directory:
unzip <path>/Confluence-backup.zip entities.xml
# fix the entities file, saving its output to a new file (entities-clean.xml):
java -jar atlassian-xml-cleaner-0.1.jar entities.xml > entities-clean.xml
# rename the original entities file
mv entities.xml entities-original.xml
# rename the fixed entities file to the expected name
mv entities-clean.xml entities.xml
# update the zip file with the new entities.xml file
zip -u <path>/Confluence-backup.zip entities.xml
For reference:
- Extract only a specific file from a zipped archive to a given directory
- How to update one file in a zip archive
If you are seeing an error specifically with 0xffff as the affected character, please use this perl command to fix the file:
perl -i -pe 's/\xef\xbf\xbf//g' entities.xml
And if experiencing the error with 0xfffe, use the below perl command:
perl -i -pe 's/\xef\xbf\xbe//g' entities.xml
And in case you are running Windows and the above Perl command doesn't work, here's a Power Shell script to fix the problem:
$yourfile = "PATH_TO_THE_XML\entities.xml"
$outputfile = "PATH_TO_SAVE_NEW_XML\entities_clean.xml"
get-content -path $yourfile | out-file $outputfile -encoding utf8