Documentation for JIRA 6.3 EAP developer (EAP) releases only. Not using this? See below:
(JIRA 6.2.x documentation | JIRA OnDemand documentation | earlier versions of JIRA)

Skip to end of metadata
Go to start of metadata
Icon

Support requests are often resolved significantly faster if a data export is provided as it will allow our legendary supporters direct access to a copy of your instance. We understand that sometimes this may be a difficult option due to the sensitivity of your data and have written an anonymising tool to handle this particular scenario.

Anonymising JIRA Data:

The JIRA inbuilt backup functionality will produce a ZIP file containing either 1 or 2 XML files, depending on the version that is being used. These files are a copy of the entire contents of JIRA's database, encoded in XML, that can be used to restore an instance - we have further detail on this in our Automating JIRA Backups documentation.

As of JIRA 4.4, the backup functionality will produce a ZIP file that contains 2 XML files. These files will be activeobjects.xml and entities.xmlOnly entities.xml will need to be anonymised - please do not attempt to anonymise the activeobjects.xml. For versions prior to 4.4, only one XML file will be produced with the same naming convention as the ZIP it is compressed as (for example 1970-Jan-01–0001.zip will expand to 1970-Jan-01--0001.xml).

  1. Ensure that the JAVA_HOME variable has been configured, as in our Setting JAVA_HOME documentation.
  2. Download the JIRA Anonymiser.
  3. Create a temporary directory.
  4. Unzip the anonymizer in the temporary directory.
  5. Unzip the JIRA backup ZIP file (for example 1970-Jan-01--0001.zip) in the temporary directory.
  6. Anonymise the backup file with the below commands:

    $ java -Xmx512m -jar joost.jar <JIRA BACKUP>.xml anon.stx > <NAME OF ANONYMISED BACKUP>.xml

    For example, this would be anonymising a JIRA backup with the naming convention from JIRA 4.4+:

    $ java -Xmx512m -jar joost.jar entities.xml anon.stx > anon-entities.xml

    (warning) Depending on the size of the backup, additional memory may need to be allocated to the JVM. In order to do this, increase the value of the Xmx in increments of 128m.

  7. Compress the generated anonymised XML backup file (e.g: anon-entities.xml) and the activeobjects.xml(JIRA 4.4.x + only) into a ZIP or tarball.
  8. Attach that ZIP or tarball onto the support issues as raised on support.atlassian.com.
  9. The temporary directory can now be removed.
The screenshot below is a simple example of how it is run in the command prompt of Windows XP:

Information about the Anonymiser:

The anonymiser currently replaces the following text with x's:

  • Issue summary, environment, and description.
  • Comments, work logs, change logs.
  • Project descriptions.
  • Descriptions for most elements (notification schemes, permission schemes, resolutions).
  • Attachment file names.
  • "Unlimited text" custom fields.

Please check the anonymised backup, anon-backup.xml, to ensure it's clean enough for the needs of your organisation before sending it to Atlassian.

Problems:

Invalid XML Characters

If, when the anonymiser runs, an error indicates that there are invalid XML characters in the XML backup of the database, run our utility to remove invalid XML characters first before anonymising.

 

  • No labels

19 Comments

  1. The anonymizer has been rewritten to use STX, a memory-efficient subset of XSLT. It should no longer run out of memory on large exports. In addition, "unlimited text" custom field values are now anonymized too.

  2. GO

    Hi everyone,

    If you get message below, during anonimizing :

    Parser has reached the entity expansion limit "64,000" set by the Application.

    then, you can try to do it with additional option : java -jar -DentityExpansionLimit=100000 joost.jar <backup.xml> anon.stx > anon-backup.xml
     
    You can also increase DentityExpansionLimit if it would be to small.
     
    Cheers,
    Gregory 

  3. To support Worklog comments, make the following change to the STX file.

    BEFORE

    AFTER

    1. Anonymous

      You also need to add Worklog/body to the previous stx:template block in the same way as you suggest adding Worklog/body/text(). Otherwise worklog body attributes won't get anonymized.

      1. FIX: I should have said Worklog/@body.

        I'm also going to add */@author and */@updateauthor to my .stx script as I want to anonymize author names too.

    2. From comments above, the revised version to include anonymising worklog comment is

  4. There is a new version of the template (referenced on this page above) that fixes the mail server username/passwords issue.

  5. Hi,

    I need to anonimize usernames and groups names. Would anybody provide me with appropriate STX syntax to iterate over all usernames, then search/replace all occurrences with a replacement string ?

    Many thanks

    Alex

  6. Also this tool is useless (with standart stx file) for non latin characters.

  7. good solution for non latin characters would be replacing translate function with

    string-pad('x', str-length(.))

  8. Anonymous

    Hello,

    what about users and groups? Do they remain in-clear? This is very sensible to me especially for email addresses,

      please let me know.

    1. Anonymous

      Yes, this is essential in my assignment to get a backup and use it outside the domain. Please help us!!!

  9. The jira_anon.zip available from http://confluence.atlassian.com/download/attachments/139008/jira_anon.zip?version=6&amp;modificationDate=1264044549328 contains a lot of Mac OS X garbage too:

    jira_anon/
    jira_anon/.DS_Store
    __MACOSX/
    __MACOSX/jira_anon/
    __MACOSX/jira_anon/._.DS_Store
    jira_anon/anon.stx
    __MACOSX/jira_anon/._anon.stx
    jira_anon/joost.jar
    __MACOSX/jira_anon/._joost.jar
    jira_anon/README.txt
    __MACOSX/jira_anon/._README.txt

     

    Next time please verify the contents of the zip before publishing. Thanks.

     

    On another note ... the anonymiser kept the details (hostname, dn, password) of our LDAP server. You might want to take a look at the <DirectoryAttribute> nodes before you send the XML to any 3rd party. It's generally good advice to look through all instances of the "passw" string in the XML backup.

  10. Apart from the worklog comments there is another part not x'ed out: ChangeItem/oldvalue. I noticed that ChangeItem/oldstring is covered but not oldvalue. New stx file:

  11. Could you please create some exception ?

    An invalid XML character (Unicode: 0x1) was found in the CDATA section.

     

     

    1. Anonymous

      I have the same problem:  An invalid XML character (Unicode: 0xc) was found

      What's wrong here??

  12. I also have this problem.I'm editing the xml file by hand to remove the offending lines but I don't know how long this will take...

    It seems to relate to unicode characters in comments by end users mostly. Can I assume  I'm missing a libary from libxml or something that is causing this? This also occurs on my prod JIRA box which is Unix and UTF-8 natively.