Hierarchical File System Attachment Storage
Confluence 8.1 introduced a new way to store attachment data in the file system.
When you upgrade to Confluence 8.1 or later, your attachments will be migrated to a new folder structure. More information about this migration task is available on this page.
For information about the previous folder structure, see the version of Hierarchical File System Attachment storage in Confluence 8.0 documentation or earlier.
Confluence stores attachments, such as files and images, in a file system.
The structure of the attachment storage has been designed to:
limit the number of entries at any single level in a directory structure
eliminate the need to move attachments between directories when a page is moved to a new location
On this page:
Attachments in Confluence have a single identifying attribute: the Content id of the original version of the attachment.
For example, the original version of the attachment has the Content id 12345678 so the attachment file names for versions 1, 2, and 6 will be 12345678.1, 12345678.2, 12345678.6 respectively.
The directory structure consists of 5 levels with the name of each level derived from the following algorithms:
Always 'v4' indicating the Confluence version 4 storage layout format
Calculated as Content id modulo 65535, modulo 256
Calculated as Content id modulo 65535, divided by 256
The Content id of the attached file
These are the file names. They are named the Content id and version number of the file, for example: x.1, x.2, x.6.
The modulo calculation is used to find the remainder after division, for example 800 modulo 250 = 50.
#!/bin/sh content_id=$1 version=1 domain=`expr $content_id % 65535` folder_1=`expr $domain % 256` folder_2=`expr $domain / 256` echo "v4/$folder_1/$folder_2/$content_id/$content_id.$version"
Diagram of the directory
Extracted text files
When a text-based file is uploaded in Confluence (for example Word, PowerPoint, etc), its text is extracted and indexed so that people can search for the content of a file, not just the filename. We store the extracted text so that when that file needs to be reindexed, we don't need to re-extract the content of the file.
The extracted text file will be named with the content id and version number, for example,
12345.2.extracted_text, and stored alongside the file versions themselves (within level 5 in the example above). We only keep the extracted text for the latest version, not earlier versions of a file.
Migration to version 4
Confluence 8.1 introduced the v4 layout format for storing attachments. To ensure a smooth transition from the previous version 3 structure, we added an automatic background task to do this migration when you upgrade to Confluence 8.1 or later.
The migration task moves all attachments from the
ver003 directory to the new
v4 directory. Any missing or broken attachments resulting from previous failed page moves that are found in this process will be restored. Duplicate attachments will be saved with the extra extension
This background task will only run on Confluence startup if a
ver003 directory exists in the attachments folder. During this process, Confluence will work as usual.
When the migration task is finished, a report file
v3-to-v4-report.log will be available in the attachments directory. A new report is created for each migration run. It contains a list of files with corresponding issues, and the migration status is printed at the bottom – for example, completed successfully, completed with warnings, or interrupted. The report file does not print successfully migrated attachments to avoid huge log files.
These dark feature customizations are available for your migration. To learn how to configure them, see Configuring System Properties.
Set this property to disable migration to
Any attachments that have already been migrated to
By default, migration uses up to half of the available CPUs on the node. Set the number of CPUs by providing a number instead of X.
Reach out to support for assistance with this feature.
|I want to know what logs are available|
You can find more about the migration from application logs in
There is also a report file
I want to find out if the migration was successful
The ver003 directory will be deleted if migration was successful. You should also review the report log. The final status will be printed at the bottom.
I want to find out if the migration is happening
A report log is created in the attachments directory. Also an entry is printed in
I can see multiple report logs in the attachments folder, I want to find out which is the correct one
A report log is created for each migration. If multiple reports are generated, you need to review the latest report and troubleshoot the issue. You may need to delete or move any files that are not real attachments from the ver003 directory.
Once all attachments have been migrated successfully, there won’t be new report logs in the folder.
The report shows
Due to past issues, some of your attachments may be duplicated in the ver003 directory. See
CONFSERVER-62835Getting issue details...
. If the migration task finds duplicate files it will move them to v4 directory and add
The report shows
This means a file failed to migrate. Check if permissions allow the move operation, and if the file is not a real attachment you will need to delete or move it from the ver003 directory manually. Then, restart Confluence to trigger another migration task.
I'm running a Confluence DC cluster and can't see any migration logs
The migration runs only on one node, so if you're not in the same node you won't be able to see progress for it in the application log,
Every node in the cluster should be able access the report file
I want to know if the migration is using up extra disk resources
I don’t want to do this migration yet
We have a dark feature that allows you to disable migration to v4. See customizations for more info.