Retrieving file attachments from a Backup

File attachments on pages can be retrieved from a backup without needing to restore the backup into Confluence. This is useful for recovering attachments that have been deleted by users.

XML backups allow this, as long as the 'Include attachments' property was set.

Before following the instructions for recovering attachments below, we will review how backups store file and page information.

On this page:

Related pages:


How backups store file and page information

The backup zip file contains entities.xml, an XML file containing the Confluence content, and a directory for storing attachments.

Backup zip file structure

Page attachments are stored under the attachments directory by page and attachment id. Here is an example listing:

Confluence 8.0 and earlier

Listing for test-2006033012_00_00.zip
\attachments\98\10001                  
\attachments\98\10002                   
\attachments\99\10001                    
entities.xml                             

Confluence 8.1 and later

Listing for test-2006033012_00_00.zip
\attachments\98\10001\1                  
\attachments\98\10002\1                   
\attachments\99\10001\3                    
entities.xml                     

Inside the attachment directory, each numbered directory inside is one page, and the numbered file inside is one attachment. The directory number is the page id, and the file number is the attachment id. For example, the file \attachments\98\10001 is an attachment with page id 98 and attachment id 10001. You can read entities.xml to link those numbers to the original filename. Entities.xml also links each page id to the page title.

Entities.xml Attachment Object

Inside the entities.xml is an Attachment object written in XML. In this example, the page id is 98, the attachment id is 10001 and the filename is myimportantfile.doc. The rest of the XML can be ignored:

<object class="Attachment" package="com.atlassian.confluence.pages">
    <id name="id">10001</id>
    <property name="title">myimportantfile.doc</property>
    <property name="lowerTitle">myimportantfile.doc</property>
    <property name="version">1</property>
    ...
    <property name="containerContent" class="Page" package="com.atlassian.confluence.pages"><id name="id">98</id></property>
    ...
</object> 

Entities.xml Page Object

This XML describes a page. In this example, the page id is 98 and the title is Editing Your Files. The rest of the XML can be ignored:

<object class="Page" package="com.atlassian.confluence.pages">
    <id name="id">98</id>
    <property name="title"><![CDATA[Editing Your Files]]></property>
    ...
</object> 

Instructions for recovering attachments

Each file must be individually renamed and re-uploaded back into Confluence by following the instructions below. Choose one of the three methods:

To recover the latest version of each attachment each file must be individually renamed and re-uploaded back into Confluence by following the instructions below. Choose one of the three methods:

Choice A - Recover attachments by filename

This option is best if you know each filename you need to restore, especially if you want just a few files.

  1. Unzip the backup directory and open entities.xml

  2. Search entities.xml for the filename and find the attachment object with that filename. Locate its page and attachment id

  3. Using the page and attachment id from entities.xml, go to the attachments directory and open that directory with that page id. Locate the directory with the attachment id

  4. Inside the attachment directory rename the file with the highest number to the original filename and test it

  5. Repeat for each attachment directory

  6. To import each file back into Confluence, upload to the original page by attaching the file from within Confluence

Choice B - Restore files by page

This option is best if you only want to restore attachments for certain pages.

  1. Unzip the backup directory and open entities.xml

  2. Search entities.xml for the page title and find the page object with that title. Locate its page id

  3. Go to the attachments directory and open that directory with that page id. Rename this directory to the page title

  4. Search entities.xml for attachment objects with that page id. Every attachment object for the page will have an attachment id, version and filename

  5. For each attachment object find the attachment directory and rename the file with the highest number (latest version) to the original filename and test it

  6. Repeat for each page

  7. To import each file back into Confluence, upload to the original page by attaching the file from within Confluence

Choice C - Restore all files

This option is best if you have a small backup but want to restore many or all the attachments inside.

The following process is applicable to space backups only. Site XML backups do not require page id to be updated manually due to the nature of persistent page_ids.

  1. Unzip the backup directory and open entities.xml
  2. Go to the attachments directory and open any directory. The directory name is a page id. Each of the files in the directory is an attachment that must be renamed
  3. Search entities.xml for attachment objects with that page id. When one is found, locate the attachment id and filename
  4. Rename the file with that attachment id to the original filename and test it
  5. Find the next attachment id and rename it. Repeat for each file in the directory
  6. Once all files in the current directory are renamed to their original filenames, search entities.xml for the page id, eg directory name. Find the page object with that page id and locate its page title
  7. Rename the directory to the page title and move on to the next directory. Repeat for each un-renamed directory in the attachments directory
  8. To import each file back into Confluence, upload to the original page by attaching the file from within Confluence
Last modified on Jul 30, 2024

Was this helpful?

Yes
No
Provide feedback about this article
Powered by Confluence and Scroll Viewport.