How to find the largest Confluence pages by storage size in the database
As part of a database audit or Confluence clean-up effort, an administrator may find it useful to locate the largest pages (in terms of disk space footprint) in Confluence. This can be achieved by querying the database for the byte lengths of each page's storage format.
The following SQL queries, especially the second one, can take quite a long time. In a MySQL test instance that contained about 300k rows in the BODYCONTENT table, the second query took almost 3 minutes to complete. If you have a large Confluence installation, it is recommended to run these against a clone of the production database, rather than against the production database itself, to avoid any impact to the production environment.
Because of the way the Confluence database is set up, it was necessary use two queries: the first query only looks at sizes of current pages, and the second one is the sum of all historic pages (not including the current page). Each of these will return the top 25 pages taking up the most space in the database (in byte lengths).
Retrieve the top 25 largest current pages (excluding historical versions):
SELECT s.spacekey, c.title, LENGTH(bc.body) FROM BODYCONTENT bc JOIN CONTENT c ON bc.contentid = c.contentid JOIN SPACES s ON c.spaceid = s.spaceid WHERE c.prevver IS NULL AND c.contenttype IN ('BLOGPOST','PAGE') ORDER BY LENGTH(bc.body) DESC LIMIT 25;
Retrieve the top 25 pages with largest sum of historical version sizes:
SELECT s.spacekey, c.title, subq.sum_size_in_bytes FROM CONTENT c JOIN SPACES s ON c.spaceid = s.spaceid JOIN ( SELECT c2.prevver, SUM(LENGTH(bc.body)) AS sum_size_in_bytes FROM BODYCONTENT bc JOIN CONTENT c2 ON bc.contentid = c2.contentid WHERE c2.prevver IS NOT NULL AND c2.contenttype IN ('BLOGPOST','PAGE') GROUP BY c2.prevver ORDER BY SUM(LENGTH(bc.body)) DESC LIMIT 25 ) AS subq ON c.contentid = subq.prevver ORDER BY subq.sum_size_in_bytes DESC;
- This does not include sizes of attachments. Typically attachments are stored on the filesystem in your <confluence_home> directory, and only the metadata information (which is negligible) is contained in the database.
- The above were tested in MySQL. You may need to modify the queries to suit your specific DBMS.