Content Anonymizer for Data Backups

On this page

Still need help?

The Atlassian Community is here for you.

Ask the community

Atlassian may request a copy of the entities.xml file from your exported zip file (backup file), in order to diagnose database corruption or to troubleshoot a bug in Confluence.

If your data is confidential, you can run an anonymizer program over your entities.xml file to remove all your data and leaving only the structure of the export.

Usage

To run the Content Anonymizer on your backup file:

  1. Download the anonymizer JAR (attached to this page).
  2. Extract the entities.xml file from your zipped backup file to the same directory as the JAR.
  3. Use the command prompt to go to the directory where all three files are located.
  4. To create cleaned.xml, run the command:

    java -jar confluence-export-cleaner-1.1-jar-with-dependencies.jar entities.xml cleaned.xml
    
  5. Move the original entities.xml file to a different location and then rename cleaned.xml to entities.xml.

  6. Re-ZIP the new entities.xml with its exportDescriptor.properties to ensure Atlassian Support know exactly which version of Confluence the XML backup was exported from.

How it works

The Content Anonymizer replaces all text content in file entities.xml with 'x' characters. For example, the word "Atlassian" will be transformed to "xxxxxxxxx". The resulting cleaned.xml file is expected to have the same size as the original file.

This release of the Content Anonymizer uses STX, a fast and efficient XML transformation technology. It should not require a lot of memory to run, even for a large backup.

Last modified on Oct 11, 2021

Was this helpful?

Yes
No
Provide feedback about this article
Powered by Confluence and Scroll Viewport.