Hierarchical File System Attachment Storage

On this page

Still need help?

The Atlassian Community is here for you.

Ask the community

Confluence 8.1 introduced a new way to store attachment data in the file system.

When you upgrade to Confluence 8.1 or later, your attachments will be migrated to a new folder structure. More information about this migration task is available on this page.

For information about the previous folder structure, see the version of Hierarchical File System Attachment storage in Confluence 8.0 documentation or earlier.

Confluence stores attachments, such as files and images, in a file system.

The structure of the attachment storage has been designed to:

  1. limit the number of entries at any single level in a directory structure

  2. eliminate the need to move attachments between directories when a page is moved to a new location


On this page:

Directory structure

Attachments in Confluence have a single identifying attribute: the Content id of the original version of the attachment.

For example, the original version of the attachment has the Content id 12345678 so the attachment file names for versions 1, 2, and 6 will be 12345678.1, 12345678.2, 12345678.6 respectively.

The directory structure consists of 5 levels with the name of each level derived from the following algorithms:

level

Derived From

1 (top)

Always 'v4' indicating the Confluence version 4 storage layout format

2

Calculated as Content id modulo 65535, modulo 256

3

Calculated as Content id modulo 65535, divided by 256

4

The Content id of the attached file

5

These are the file names. They are named the Content id and version number of the file, for example: x.1, x.2, x.6.

The modulo calculation is used to find the remainder after division, for example 800 modulo 250 = 50.

An example in bash showing the path to the first version of the attachment
#!/bin/sh

content_id=$1
version=1
domain=`expr $content_id % 65535`
folder_1=`expr $domain % 256`
folder_2=`expr $domain / 256`​
​
echo "v4/$folder_1/$folder_2/$content_id/$content_id.$version"

Diagram of the directory

Extracted text files

When a text-based file is uploaded in Confluence (for example Word, PowerPoint, etc), its text is extracted and indexed so that people can search for the content of a file, not just the filename. We store the extracted text so that when that file needs to be reindexed, we don't need to re-extract the content of the file.

The extracted text file will be named with the content id and version number, for example, 12345.2.extracted_text, and stored alongside the file versions themselves (within level 5 in the example above).  We only keep the extracted text for the latest version, not earlier versions of a file. 

Migration to version 4

Confluence 8.1 introduced the v4 layout format for storing attachments. To ensure a smooth transition from the previous version 3 structure, we added an automatic background task to do this migration when you upgrade to Confluence 8.1 or later.

The migration task moves all attachments from the ver003 directory to the new v4 directory. Any missing or broken attachments resulting from previous failed page moves that are found in this process will be restored. Duplicate attachments will be saved with the extra extension .duplicate.X.

This background task will only run on Confluence startup if a ver003 directory exists in the attachments folder. During this process, Confluence will work as usual.

When the migration task is finished, a report file v3-to-v4-report.log will be available in the attachments directory. A new report is created for each migration run. It contains a list of files with corresponding issues, and the migration status is printed at the bottom – for example, completed successfully, completed with warnings, or interrupted. The report file does not print successfully migrated attachments to avoid huge log files.

Customizations

These dark feature customizations are available for your migration. To learn how to configure them, see Configuring System Properties.

System property

Description

-Datlassian.darkfeature.confluence.disable-attachments-ver004=true

Set this property to disable migration to v4 storage and keep using legacy ver003.

Any attachments that have already been migrated to v4 won't be reversed.

-Dconfluence.attachments-ver004-migration-num-of-threads=X

By default, migration uses up to half of the available CPUs on the node. Set the number of CPUs by providing a number instead of X.

Reach out to support for assistance with this feature.

Troubleshooting

Issue

Solution

I want to know what logs are available

You can find more about the migration from application logs in atlassian-confluence.log. See Working with Confluence Logs

There is also a report file v3-to-v4-report.log created in the attachments directory.

I want to find out if the migration was successful

The ver003 directory will be deleted if migration was successful. You should also review the report log. The final status will be printed at the bottom.

I want to find out if the migration is happening

A report log is created in the attachments directory. Also an entry is printed in atlassian-confluence.log every 50,000 migrated attachments. Search this log file for the string Attachments migration from ver003 to v4 progressed

I can see multiple report logs in the attachments folder, I want to find out which is the correct one

A report log is created for each migration. If multiple reports are generated, you need to review the latest report and troubleshoot the issue. You may need to delete or move any files that are not real attachments from the ver003 directory.

Once all attachments have been migrated successfully, there won’t be new report logs in the folder.

The report shows Duplicate file saved as ... entries

Due to past issues, some of your attachments may be duplicated in the ver003 directory. See CONFSERVER-62835 - Getting issue details... STATUS . If the migration task finds duplicate files it will move them to v4 directory and add .duplicate.X suffix. After migration, you should review if the duplicates are needed.

The report shows Failed to migrate. Msg:

This means a file failed to migrate. Check if permissions allow the move operation, and if the file is not a real attachment you will need to delete or move it from the ver003 directory manually. Then, restart Confluence to trigger another migration task.

I'm running a Confluence DC cluster and can't see any migration logs

The migration runs only on one node, so if you're not in the same node you won't be able to see progress for it in the application log, atlassian-confluence.log

Every node in the cluster should be able access the report file v3-to-v4-report.log in the attachments directory. Check the report for details about your migration.

I want to know if the migration is using up extra disk resources

Migration performs move operation only, and on most file systems it will not use any extra disk space.

Also, v4 layout should use less inodes than v3.

I don’t want to do this migration yet

We have a dark feature that allows you to disable migration to v4. See customizations for more info.



Last modified on Oct 3, 2024

Was this helpful?

Yes
No
Provide feedback about this article
Powered by Confluence and Scroll Viewport.