How To Validate XML Backups Using the Same Java Classes as Cloud
Platform Notice: Cloud Only - This article only applies to Atlassian products on the cloud platform.
When performing site imports to the cloud environment, it's common for an import to fail due to broken XML tags or structure. Due to the size of some backups (20GB's and above), it's not easy to validate the structure with tooling.
The following message appears when there are possible XML inconsistencies on the backup file.
2021-04-01 23:40:34.834 ERROR com.atlassian.jira.bc.dataimport.CloudImportService Error occurred while parsing export file. XML document structures must start and end within the same entity. org.xml.sax.SAXParseException; lineNumber: 312418350; columnNumber: 22; XML document structures must start and end
We typically use xmllint to check XML validity, however, this has a couple of downsides:
This is complicated to install on Windows
The tool is not meant for large backups: We typically start seeing severe performance issues in XML files larger than 20GB
It’s not an exact test: Some XML validators work differently than others. The last thing you want is to think the file is valid, only for an import into Jira Cloud to fail after many hours with an XML formatting issue
xmllint --noout <Backup File Location>/entities.xml xmllint --noout <Backup File Location>/activeobjects.xml
The solution would be to find an XML parser that uses the same exact classes and logic that our Cloud applications use and can be used for extremely large backup files. We needed something that uses the same exact SAX XML parser that Cloud uses and can be as close as of a test as possible to the real import, something that was OS agnostic and scalable.
Download the MigrationsXMLValidator.jar file below to the same folder as entities.xml
Open a command prompt/terminal and locate the XML or ZIP backup file on your computer, ensuring that it is extracted if it's within a ZIP file. In this example, we will use
Run the application with the below, making sure to adjust the path of the XML file if needed and adjusting the -Xmx depending on your XML size. For example, if working with files larger than 20GB, try to allocate more memory via -Xmx.
java -Xmx1024m -DentityExpansionLimit=0 -DtotalEntitySizeLimit=0 -Djdk.xml.totalEntitySizeLimit=0 -jar ./MigrationsXMLValidator.jar ./location/of/entities.xml
This will run the XML file through the same exact parsing and classes that our products use in Atlassian Cloud, giving us almost 100 confidence that the XML validation will pass.
If the output is blank, the file is valid. If there’s an issue, a stack trace like this will be shown, pointing to the same exact line in the file causing the issue:
[Fatal Error] entities.xml:80:49: The element type "body" must be terminated by the matching end-tag "</body>". 280 3org.xml.sax.SAXParseException; systemId: file:./entities.xml; lineNumber: 80; columnNumber: 49; The element type "body" must be terminated by the matching end-tag "</body>". 4 at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1166) 5 at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(SAXParserImpl.java:655) 6 at RestrictedXMLReader.parse(test3.java:573) 7 at test3.main(test3.java:84)
Still need help?
Have you checked all the listed suggestions but couldn’t address the error message, please engage with our support team for further assistance.