Retrieving file attachments from a Backup
File attachments on pages can be retrieved from a backup without needing to restore the backup into Confluence. This is useful for recovering attachments that have been deleted by users.
Both scheduled and manual backups allow this, as long as the 'Include attachments' property was set.
Before following the instructions for recovering attachments below, we will review how backups store file and page information.
On this page:
Related pages:
How backups store file and page information
The backup zip file contains entities.xml, an XML file containing the Confluence content, and a directory for storing attachments.
Backup zip file structure
Page attachments are stored under the attachments directory by page and attachment id. Here is an example listing:
Confluence 8.0 and earlier
Listing for test-2006033012_00_00.zip
\attachments\98\10001
\attachments\98\10002
\attachments\99\10001
entities.xml
Confluence 8.1 and later
Listing for test-2006033012_00_00.zip
\attachments\98\10001\1
\attachments\98\10002\1
\attachments\99\10001\3
entities.xml
Inside the attachment directory, each numbered directory inside is one page, and the numbered file inside is one attachment. The directory number is the page id, and the file number is the attachment id. For example, the file \attachments\98\10001
is an attachment with page id 98 and attachment id 10001. You can read entities.xml to link those numbers to the original filename. Entities.xml also links each page id to the page title.
Entities.xml Attachment Object
Inside the entities.xml is an Attachment object written in XML. In this example, the page id is 98, the attachment id is 10001 and the filename is myimportantfile.doc. The rest of the XML can be ignored:
<object class="Attachment" package="com.atlassian.confluence.pages">
<id name="id">10001</id>
<property name="title">myimportantfile.doc</property>
<property name="lowerTitle">myimportantfile.doc</property>
<property name="version">1</property>
...
<property name="containerContent" class="Page" package="com.atlassian.confluence.pages"><id name="id">98</id></property>
...
</object>
Entities.xml Page Object
This XML describes a page. In this example, the page id is 98 and the title is Editing Your Files. The rest of the XML can be ignored:
<object class="Page" package="com.atlassian.confluence.pages">
<id name="id">98</id>
<property name="title"><![CDATA[Editing Your Files]]></property>
...
</object>
Instructions for recovering attachments
Each file must be individually renamed and re-uploaded back into Confluence by following the instructions below. Choose one of the three methods:
To recover the latest version of each attachment each file must be individually renamed and re-uploaded back into Confluence by following the instructions below. Choose one of the three methods:
Choice A - Recover attachments by filename
This option is best if you know each filename you need to restore, especially if you want just a few files.
Unzip the backup directory and open entities.xml
Search entities.xml for the filename and find the attachment object with that filename. Locate its page and attachment id
Using the page and attachment id from entities.xml, go to the attachments directory and open that directory with that page id. Locate the directory with the attachment id
Inside the attachment directory rename the file with the highest number to the original filename and test it
Repeat for each attachment directory
To import each file back into Confluence, upload to the original page by attaching the file from within Confluence
Choice B - Restore files by page
This option is best if you only want to restore attachments for certain pages.
Unzip the backup directory and open entities.xml
Search entities.xml for the page title and find the page object with that title. Locate its page id
Go to the attachments directory and open that directory with that page id. Rename this directory to the page title
Search entities.xml for attachment objects with that page id. Every attachment object for the page will have an attachment id, version and filename
For each attachment object find the attachment directory and rename the file with the highest number (latest version) to the original filename and test it
Repeat for each page
To import each file back into Confluence, upload to the original page by attaching the file from within Confluence
Choice C - Restore all files
This option is best if you have a small backup but want to restore many or all the attachments inside.
The following process is applicable to space backups only. Site XML backups do not require page id to be updated manually due to the nature of persistent page_ids.
- Unzip the backup directory and open entities.xml
- Go to the attachments directory and open any directory. The directory name is a page id. Each of the files in the directory is an attachment that must be renamed
- Search entities.xml for attachment objects with that page id. When one is found, locate the attachment id and filename
- Rename the file with that attachment id to the original filename and test it
- Find the next attachment id and rename it. Repeat for each file in the directory
- Once all files in the current directory are renamed to their original filenames, search entities.xml for the page id, eg directory name. Find the page object with that page id and locate its page title
- Rename the directory to the page title and move on to the next directory. Repeat for each un-renamed directory in the attachments directory
- To import each file back into Confluence, upload to the original page by attaching the file from within Confluence