How to find the largest Confluence pages by storage size in the database

Still need help?

The Atlassian Community is here for you.

Ask the community

Purpose

As part of a database audit or Confluence clean-up effort, an administrator may find it useful to locate the largest pages (in terms of disk space footprint) in Confluence. This can be achieved by querying the database for the byte lengths of each page's storage format.

Solution

The following SQL queries, especially the second one, can take quite a long time. In a MySQL test instance that contained about 300k rows in the BODYCONTENT table, the second query took almost 3 minutes to complete. If you have a large Confluence installation, it is recommended to run these against a clone of the production database, rather than against the production database itself, to avoid any impact to the production environment.

Because of the way the Confluence database is set up, it was necessary use two queries: the first query only looks at sizes of current pages, and the second one is the sum of all historic pages (not including the current page). Each of these will return the top 25 pages taking up the most space in the database (in byte lengths).

Retrieve the top 25 largest current pages (excluding historical versions):

SELECT s.spacekey, c.title, LENGTH(bc.body)
FROM BODYCONTENT bc
JOIN CONTENT c ON bc.contentid = c.contentid
JOIN SPACES s ON c.spaceid = s.spaceid
WHERE c.prevver IS NULL AND c.contenttype IN ('BLOGPOST','PAGE')
ORDER BY LENGTH(bc.body) DESC LIMIT 25; 

Retrieve the top 25 pages with largest sum of historical version sizes:

SELECT s.spacekey, c.title, subq.sum_size_in_bytes
FROM CONTENT c
JOIN SPACES s ON c.spaceid = s.spaceid
JOIN (
	SELECT c2.prevver, SUM(LENGTH(bc.body)) AS sum_size_in_bytes
	FROM BODYCONTENT bc
	JOIN CONTENT c2 ON bc.contentid = c2.contentid
	WHERE c2.prevver IS NOT NULL AND c2.contenttype IN ('BLOGPOST','PAGE')
	GROUP BY c2.prevver
	ORDER BY SUM(LENGTH(bc.body)) DESC LIMIT 25
) AS subq ON c.contentid = subq.prevver
ORDER BY subq.sum_size_in_bytes DESC;

Other notes:

  • This does not include sizes of attachments. Typically attachments are stored on the filesystem in your <confluence_home> directory, and only the metadata information (which is negligible) is contained in the database.
  • The above were tested in MySQL. You may need to modify the queries to suit your specific DBMS.


Last modified on Sep 1, 2021

Was this helpful?

Yes
No
Provide feedback about this article
Powered by Confluence and Scroll Viewport.