SAXException error when running content anonymizer for confluence

Still need help?

The Atlassian Community is here for you.

Ask the community

Summary

Atlassian may request XML backup to  troubleshoot bugs in Confluence. To protect the customer' data from leaking, the tool of Content Anonymizer can be used to clean backup data(entities.xml). However, some special characters may cause SAXException during cleaning.

For example, special character (code 55357: emoji of smiling face) caused below error.

$java -jar confluence-export-cleaner-1.1-jar-with-dependencies.jar entities.xml cleaned.xml
2021-04-14 21:40:12,157 INFO Starting to clean export file 'entities.xml'. This may take a few minutes.
Exception in thread "main" java.lang.RuntimeException: org.xml.sax.SAXException: Cannot output character with code 55357 in the encoding UTF-8' within a CDATA section javax.xml.transform.TransformerException: Cannot output character with code 55357 in the encoding UTF-8' within a CDATA section

Cause

Anonymizer tool is not able to deal with special characters (like smiling face) included in the backup file (entities.xml) of confluence.

Solution

If the size of entities.xml is small, special characters can be removed via editor manually.

However, if the size is too large to edit directly, below method can be used.

java -jar atlassian-xml-cleaner-0.1.jar entities.xml > entities-clean.xml 
  • Then running anonymizer tool to clean entities.xml.

Reference

The tool of cleaning special characters is originally used to for Jira, see detail at : Removing invalid characters from XML backups

Last modified on Apr 27, 2021

Was this helpful?

Yes
No
Provide feedback about this article
Powered by Confluence and Scroll Viewport.