Retrieving File Attachments from a Backup
File attachments on pages can be retrieved from a backup without needing to import the backup into Confluence. This is useful for recovering attachments that have been deleted by users.
Both automated and manual backups allow this, as long as the 'Include attachments' property was set. If you want to restore pages, spaces or sites, see the Confluence Administrator's Guide instead.
Before following the instructions for recovering attachments below, we will review how backups store file and page information.
The information on this page does not apply to Confluence Cloud.
How Backups Store File and Page Information
The backup zip file contains entities.xml, an XML file containing the Confluence content, and a directory for storing attachments.
Backup Zip File Structure
Page attachments are stored under the attachments directory by page and attachment id. Here is an example listing:
Listing for test-2006033012_00_00.zip
\attachments\98\10001
\attachments\98\10002
\attachments\99\10001
entities.xml
Inside the attachment directory, each numbered directory inside is one page, and the numbered file inside is one attachment. The directory number is the page id, and the file number is the attachment id. For example, the file \attachments\98\10001 is an attachment with page id 98 and attachment id 10001. You can read entities.xml to link those numbers to the original filename. Entities.xml also links each page id to the page title.
Entities.xml Attachment Object
Inside the entities.xml is an Attachment object written in XML. In this example, the page id is 98, the attachment id is 10001 and the filename is myimportantfile.doc. The rest of the XML can be ignored:
<object class="Attachment" package="com.atlassian.confluence.pages">
<id name="id">98</id>
<property name="fileName"><![CDATA[myimportantfile.doc]]></property>
...
<property name="content" class="Page" package="com.atlassian.confluence.pages"><id name="id">10001</id>
</property>
...
</object>
Entities.xml Page Object
This XML describes a page. In this example, the page id is 98 and the title is Editing Your Files. The rest of the XML can be ignored:
<object class="Page" package="com.atlassian.confluence.pages">
<id name="id">98</id>
<property name="title"><![CDATA[Editing Your Files]]></property>
...
</object>
Instructions for Recovering Attachments
Each file must be individually renamed and re-uploaded back into Confluence by following the instructions below. Choose one of the three methods:
Choice A - Recover Attachments By Filename
Best if you know each filename you need to restore, especially if you want just a few files:
- Unzip the backup directory and open entities.xml.
- Search entities.xml for the filename and find the attachment object with that filename. Locate its page and attachment id.
- Using the page and attachment id from entities.xml, go to the attachments directory and open that directory with that page id. Locate the file with the attachment id.
- Rename the file to the original filename and test it.
- Repeat for each file.
- To import each file back into Confluence, upload to the original page by attaching the file from within Confluence.
Choice B - Restore Files By Page
Best if you only want to restore attachments for certain pages:
- Unzip the backup directory and open entities.xml.
- Search entities.xml for the page title and find the page object with that title. Locate its page id.
- Go to the attachments directory and open that directory with that page id. Each of the files in the directory is an attachment that must be renamed.
- Search entities.xml for attachment objects with that page id. Every attachment object for the page will have an attachment id and filename.
- Rename the file with that attachment id to the original filename and test it.
- Repeat for each page.
- To import each file back into Confluence, upload to the original page by attaching the file from within Confluence.
Choice C - Restore All Files
Best if you have a small backup but want to restore many or all the attachments inside:
Following process is applicable to space export only. Site xml backups do not require page id to be updated manually due to the nature of persistent page_id's.
- Unzip the backup directory and open entities.xml.
- Go to the attachments directory and open any directory. The directory name is a page id. Each of the files in the directory is an attachment that must be renamed.
- Search entities.xml for attachment objects with that page id. When one is found, locate the attachment id and filename.
- Rename the file with that attachment id to the original filename and test it.
- Find the next attachment id and rename it. Repeat for each file in the directory.
- Once all files in the current directory are renamed to their original filenames, search entities.xml for the page id, eg directory name. Find the page object with that page id and locate its page title.
- Rename the directory to the page title and move on to the next directory. Repeat for each un-renamed directory in the attachments directory.
- To import each file back into Confluence, upload to the original page by attaching the file from within Confluence.