Confluence Support

This guide only applies to Confluence Server or Data Center 6.5 and later.

Purpose

When a file is uploaded in Confluence, its text is extracted and indexed so that people can search for the content of a file, not just the filename. From Confluence 6.5, we store this extracted text in the filesystem alongside the attached file, so that when that file needs to be reindexed (for example, when the page it's attached to changes), we don't need to re-extract the content of the file. We'll only re-extract the content when a new version of the file is uploaded, and store extracted text for the latest version of the, not earlier versions.

The files containing the extracted text are generally quite small, but over time this can add up to a lot of additional files, and increase the total size of the attachments directory backup (part of your home / shared home directory). For this reason, you might want to exclude these files when backing up your attachments directory.

Solution

To exclude these files from your backup, you can rely on the file extension which is always .extracted_text. For example, the following unix shell script backs up attachments without including the extracted text files.

$tar -czf attachments.tar.gz –exclude '.extracted_text' ./shared/attachments

The extracted text files are also not included when you perform a space or site export.

Confluence Support

Get started

Knowledge base

Products

Jira Software

Jira Service Management

Jira Work Management

Confluence

Bitbucket

Resources

Documentation

Community

System Status

Suggestions and bugs

Marketplace

Billing and licensing

How to exclude extracted text from attachments directory backup

Still need help?

Purpose

Solution

Page

Viewport

Confluence

How to exclude extracted text from attachments directory backup

Related content

Still need help?

Purpose

Solution

Related content