Hierarchical File System Attachment Storage
The way attachments are stored changed significantly in Confluence 3.0. If you are upgrading from Confluence 2.10 or earlier see Upgrading Confluence for recommended upgrade paths, and read the version of the Hierarchical File System Attachment Storage page in our Confluence 3.0 documentation which provides more detail about migrating to the new file system structure.
Confluence stores attachments, such as files and images, in a file system. Confluence's attachment storage layout is designed to:
- Limit the number of entries at any single level in a directory structure (as some file systems have a limit on the number of files that can be stored in a directory).
- Partition attachments per space making it possible for a system admin to selectively back up attachments from particular spaces.
Attachments in Confluence have a number of identifying attributes: content id of the file itself, the space id and content id of the page the file is attached to. This means the file logically belongs to a piece of content which logically belongs in a space (not all content belongs to a space). For files within a space in Confluence, the directory structure is typically 8 levels, with the name of each directory level based on the following algorithm:
Always 'ver003' indicating the Confluence version 3 storage format
The least significant 3 digits of the space id, modulo 250
The next 3 least significant digits of the space id, modulo 250
The full space id
The least significant 3 digits of the content id of the page the file is attached to, modulo 250
The next 3 least significant digits of the content id of the page the file is attached to, modulo 250
The full content id of the page the file is attached to
The full content id of the attached file
These are the files, named with the version number of the file, e.g. 1, 2, 6.
The modulo calculation is used to find the remainder after division, for example 800 modulo 250 = 50.
To find the directory where attachments for a particular space are stored, go to
<confluence url>/admin/findspaceattachments.jsp and enter a space key. It will return the directory on the file system where attachments for that space are stored.
File D in the above diagram is stored in a slightly different structure. Files that are not conceptually within a space replace the level 2 - 4 directories with a single directory called 'nonspaced'. Examples of such files are the global site logo and attachments on unsaved content.
Extracted text files
When a text based file is uploaded in Confluence (for example Word, PowerPoint, etc), its text is extracted and indexed so that people can search for the content of a file, not just the filename. We store the extracted text so that when that file needs to be reindexed, we don't need to re-extract the content of the file.
The extracted text file will be named with the version number, for example
2.extracted_text, and stored alongside the file versions themselves (within level 8 in the explanation above). We only keep the extracted text for the latest version, not earlier versions of a file.