Clean up your Confluence instance

Still need help?

The Atlassian Community is here for you.

Ask the community

Support for Server licenses ended on February 15, 2024. Discover your options.

The purpose of this guide is to outline the ways you can declutter and reduce the overall size of your Confluence site. 

There is no single approach, or magic button, that will work in all situations so on this page, we'll show a range of techniques you can use to clean up your site. This will no doubt be a team effort, and you'll need to enlist the help of people in your organisation. See Prepare for the cleanup for more about determining business rules and getting the right people involved.  

On this page:


For easy access to tools and tips that can help you deal with unwanted data, go to Administration  > General Configuration > Clean up.
This page is available in Confluence Data Center 7.16 and later.

Identify unused spaces

Use analytics data to identify unused spaces

Available in Confluence Data Center 7.11 and later.

One of the first steps, when beginning to clean up your site, is to identify spaces that are no longer used on a regular basis. 

  1. Go to Analytics in the Confluence header.
  2. Select the Spaces tab.
  3. Set your date range.
  4. Sort the table by views (ascending).

Spaces with the fewest views will be listed first. Select a space to see detailed statistics, including most popular content and active contributors.  From there you could contact the space administrator, or reach out to the active contributors to see whether the space could be archived, or even backed up and removed. 


What's the impact?

(tick) Identify spaces that have not been viewed in a specific amount of time.

(tick) See detailed statistics, including comment activity, and the number of unique users who have viewed the space in a specific amount of time.

(tick) Identify most active contributors, or recent viewers, so you can enlist them to help with the clean up.

Declutter your site

Archive spaces to improve search result relevance

As your site grows beyond a certain size, search becomes essential. One way you can improve the relevance of search results is to archive spaces that are no longer needed on a day to day basis. You will need Space Administrator permission for the particular space to do this. 


What's the impact?

(tick) More relevant search results because archived content is excluded by default.

(tick) Archived content is easy for anyone to access at any time.

(error) No change to search performance or index size, as the archived content still exists in the index.


To archive a space:

  1. Go to the space and choose Space tools > Overview from the bottom of the sidebar
  2. Select Edit.
  3. Select Archived from the status dropdown.
  4. Save your changes. 

Pages, blog posts, files, and comments in that space will not be included in search results, unless the user chooses to include results from archived spaces.  You get the benefit of more relevant search results, but without losing quick and easy access to the archived contents.  This is important if your organization needs to maintain records for regulatory or compliance reasons.  

Learn more about archiving a space 


Disable user accounts to make mentioning people faster

When someone leaves your organization, or no longer needs to use Confluence, it is good practice to disable their user account. Most organizations do a great job of staying on top of this, particularly if you're managing your users in an external directory.  However it can be useful to periodically check whether there are user accounts do not need to be active. 


What's the impact?

(tick) Fewer users appear in mentions and in search, so it is easier to select the right person

(error) No change to index size, or database size as the disabled user accounts still exist, and can be re-enabled. 

It's worth mentioning that deleting a user will not have an impact on your database size, or performance if they have created content. This is because the content the person created is not automatically deleted, and their user account is anonymised, not removed from the database, to prevent their contributions being misattributed. 


There are several ways you can identify users who might be disabled. See How to identify inactive users in Confluence to learn how to query your database for the last login date. Remember to always backup your database before you do this. 

The way you disable a user account depends on how your users are managed.

Learn how to delete or disable users 


Declutter your spaces

This is something that everyone can get involved in. Often known as wiki gardening, this is the process of identifying and moving content that is no longer useful. Confluence Data Center and Server doesn't provide a way to archive individual pages, but there are some workarounds that can give you a similar result. 

Organise obsolete pages in the page tree

Reorganising the page tree can work quite well for project spaces, where you want to differentiate between current pages, and superseded pages. It's particularly useful if your space is already neatly organised. 


What's the impact?

(tick) The page tree prioritises useful, current pages.

(tick) All pages are accessible in search results, and are easy to navigate to.

(error) No indication that the page is outdated or obsolete when someone lands from search. 

(error) No improvements to performance or search relevance. 


People need the Add page permission in the space to do this. 

To create an area for obsolete pages in the page tree:

  1. Create a new page, and give it an obvious name, such as "Obsolete pages" or "Old work"
  2. Move all unwanted pages to be a child of this page. Either:
    1. Go to the space and choose Space tools > Reorder pages from the bottom of the sidebar You can then drag pages to their new location.
    2. Navigate to the unwanted page, and choose More options
      More options
       > Move
       then specify the new page in the New Parent Page field. 

You can move single pages, or whole page trees (a parent page and all its child pages). 

Learn more about moving pages


Move obsolete pages to an archive space

Moving pages to an archive space can work quite well for team spaces, which tend to have a long life. By moving outdated or obsolete content to another space, you keep the original space well organised and relevant.


What's the impact?

(tick) More relevant search results because archived content is excluded by default.

(tick) Delegate the work to team members, who can move pages, then a space admin can change the space status to archived. 

(error) Can only move pages into the space when the status is active (not archived).

(error) No change to search performance or index size, as the archived content still exists in the index.


People need Delete page permission in the original space, and Add page permission in the destination space to do this. 

To make an archive space for your unwanted pages:

  1. Create a new space, for example "Design Team Archive".
  2. Navigate to an unwanted page, and choose More options
    More options
     > Move
     then specify the new space in the New Parent Page field. 
  3. Specify a page, such as the archive space homepage, in the New Parent Page field. 
  4. Repeat this process for all unwanted pages. You can move single pages, or whole page trees (a parent page and all its child pages).
  5. Once all the unwanted pages have been moved, archive the new space

The original space remains current, and all its content is searchable. The content moved to the archived space no longer appears in search results, but is easily accessible to your team, either by navigating directly to the archived space, or by selecting the option to include archived spaces in search. 

Note that you can't move pages into the space while its status is 'Archived'. However it is a simple matter to change the status back to 'Current' when you need to move more pages.  You might even consider a quarterly clean up day, when team members focus on moving pages that are no longer needed into the space. 

Learn more about archiving a space 

Use a marketplace app

There are several wiki gardening, archiving, and worflow apps available on the Atlassian Marketplace.  They have a range of features including per page archiving, and tools for identifying pages that have not been updated in a specified amount of time. 


What's the impact?

(tick) May provide additional tools for identifying out of date content.

(error) No change to search performance or index size, as the archived content still exists in the index.

(error) May require a commercial (paid) license.


There are a range of apps that can help you clean up your site. For example: 

  • Better Content Archiving for Confluence
    This app targets the lifecycle of your Confluence pages. It provides analytics, expiration, review workflow, retention and archive spaces. This is a very popular, highly rated app. 
  • Confluence Command Line Interface (CLI)
    Many of you already use Bob Swift's command line interface. It can also be used to identify types of content that should be cleaned up. This app gives you the flexibility to complete and automate clean up tasks. 
  • ScriptRunner for Confluence
    This app allows you to create and run scripts against Confluence, including bulk automations such as bulk delete attachments and automating tasks like archiving or removing large attachments.
  • Scroll HTML Exporter for Confluence
    This app lets you create static HTML exports of Confluence spaces, and style them beautifully. This can be great for long-term archiving to a file system or other location. 

New apps are being developed all the time, so check out the Atlassian Marketplace to see what's new. 

Find Confluence apps on The Atlassian Marketplace

Reduce data in the database and on disk

There are a number of things you can do to reduce the amount of data in your database, or stored on disk. This can help with storage costs, and also time required for backups and upgrades. 

Automatically delete historical versions of pages and attached files

Available in Confluence Data Center 7.16 and later.

Every time someone edits a page, or re-uploads a file, Confluence stores the previous version so you can roll back to it if needed.  Over time though, this can really add up.  It's not unusual for a popular or long-running page to have hundreds of historical versions. 

If you don't have a need to keep older historical versions, for example for governance or compliance reasons, you can configure Confluence to automatically delete historical versions based on a very specific criteria. 

  1. Go to Administration  > General Configuration > Retention rules
  2. Add exemptions for any spaces that require different rules, for example you may want to always keep all versions in your HR or Finance space. 
  3. Set a global rule for pages and attached files.  

Approximately every ten minutes, Confluence will check for historical versions that meet your criteria and remove them, in small batches. This means it is constantly cleaning up your site in the background, never allowing historical data to grow too large. 

Learn more about retention rules


What's the impact?

(tick) Fully automated, and flexible enough to meet the needs of different parts of your organization.

(tick) Reduction in database size may provide some performance gain.

(tick) Reduction in database size and attachment storage reduces backup time and complexity. 

(error) No improvements to search relevance, as at least one version of every page or file is retained. 

Delete Synchrony data

Available in Confluence Data Center and Server 7.0 and later.

When collaborative editing is enabled, Synchrony is used to synchronise changes made in the editor. This is how multiple people can see each others changes in real time. Each page and blog post has its own Synchrony change log, which contains a graph of all edits to that page or blog post. In busy Confluence sites the database tables that store the Synchrony change logs can grow very quickly.


What's the impact?

(tick) Reduction in Synchrony storage reduces backup time and complexity

(tick) Removes personally identifiable information that may still be stored, even after a page is deleted.

(error) Hard eviction job may have a small impact on users who have had a draft open for a long time. 


From 7.0, Confluence pro-actively cleans up Synchrony change logs that are no longer needed. If you want to remove Synchrony data more aggressively, you can run the Synchrony hard eviction scheduled job. 

Learn how to remove Synchrony data  


Identify large attachments in a space

Files such as images, documents, video, and audio are very easy to attach to a Confluence page.  Sometimes these files are attached, but then superseded or not used. While there is no way to automatically clean up these files, you can see a list of files, sorted by size, which can give you a place to start. 


What's the impact?

(tick)  Reduction in attachment storage reduces backup time and complexity.

(error) Files cannot be restored.


To view the attachments in a space:

  1. Go to the space and select Space tools > Content Tools from the bottom of the sidebar
  2. Select the Space Attachments tab.
  3. Select the Size column heading to sort the list by file size.
  4. Navigate to any page containing a very large attachment to check if it's being used. 

There's no easy way to tell whether a file has been uploaded, but is not displayed or linked on a page. For this reason it might be a good idea to contact the creator or recent editor of the page to determine whether the very large files are required. 

Note that the Space Attachments list will only show files on pages that you have permission to see.

Learn more about managing files


Empty the trash

You'd be surprised to find how much space pages, blog posts, and files in the trash continue to take up in your database and on disk. It's good practice to empty the trash on a regular basis. 


What's the impact?

(tick)  Reduction in database size and attachment storage can reduce backup time and complexity.

(error) Pages, blog posts, and files cannot be restored.


You need Space Admin permission to do this. 

To empty the trash in a space:

  1. Go to the space and select Space tools > Content Tools from the bottom of the sidebar
  2. Select the Trash tab.
  3. Select Purge all.

 Alternatively, you can selectively purge pages and files that are no longer required.  Once a page, blog post, or file is purged from the trash it cannot be restored. 

Learn more about deleting pages

Automatically purge the trash

Available in Confluence Data Center and Server 7.16 and later. 

If you don't need to keep deleted items in the trash for governance or compliance reasons, you can configure Confluence to automatically purge the trash based on the date that items were deleted. 

  1. Go to Administration  > General Configuration > Retention rules
  2. Add exemptions for any spaces that require different rules, for example you may want never want to automatically purge the trash in your Finance space. 
  3. Set a global rule for trash to cover all other spaces.  

Approximately every ten minutes, Confluence will check for items in the trash that meet your criteria and remove them, in small batches. This means it is constantly cleaning up your site in the background, never allowing the trash to grow too large. 

Learn more about retention rules

Backup then delete unwanted spaces  

If you have older content that doesn't need to be accessible by everyone, you can consider exporting the space to XML, and storing it as a long-term backup. If you allow users to create personal spaces, this might be an appropriate action when someone leaves your organisation, and their personal space is no longer required. 


What's the impact?

(tick) More relevant search results because the deleted content no longer exists in the index.

(tick) Reduction in database size may provide some performance gain.

(tick) Reduction in database size and attachment storage reduces backup time and complexity. 

(error) Content is not easily accessible. 


People need Space Admin and Export Space permissions to do this. 

To backup and remove a space:

  1. Go to the space and select Space tools > Content Tools from the bottom of the sidebar
  2. Select the Export tab and follow the prompts to export the space to XML. 
  3. Confirm the space was exported successfully by importing it into a test instance. This is important to avoid problems with corrupt backups. 
  4. Go to the space and choose Space tools > Overview from the bottom of the sidebar
  5. Select the Delete Space tab and follow the prompts to permanently delete the space. 

If you need to allow end users some access, one strategy might be to export the space to PDF as well as XML.  The PDF is easily searchable, and could be made available to your users through a file share, while the XML can be stored for long-term backup, and re-imported into Confluence should you need to reinstate the space. You could also consider exporting the space to HTML, however because the HTML export doesn't include any styling, its functional, but not very attractive. 

Learn how to export a space

Learn how to delete a space


Last modified on May 15, 2024

Was this helpful?

Yes
No
Provide feedback about this article
Powered by Confluence and Scroll Viewport.