How to Locate All Orphaned Pages in Confluence
Purpose
An orphaned page is a page without any incoming links. This means that, unless you know that the page exists, you are not likely to come across it in the space during the natural course of navigation.
Confluence 6.15 and earlier has a built-in method to identify orphaned pages on a space-by-space basis (see: Orphaned Pages). Given an instance of Confluence with a large number of spaces, however, a site administrator may want a way to locate every orphaned page across all spaces at once as part of a housekeeping effort or cleanup. The SQL query provided below can be run against the Confluence database to return this information.
Solution
- The following query was tested against Confluence 5.x, using Postgres, MySQL, Oracle and MSSQL database servers.
SELECT s.SPACENAME, s.SPACEKEY, s.SPACEID, c.TITLE, c.CONTENTID
FROM CONTENT c,
SPACES s
WHERE c.CONTENTTYPE = 'PAGE'
AND ( ( c.PREVVER IS NULL )
AND ( c.CONTENT_STATUS = 'current' )
AND ( Lower(s.SPACEKEY) IN (SELECT Lower(SPACEKEY) FROM SPACES)
AND c.SPACEID = s.SPACEID )
AND ( ( s.HOMEPAGE != c.CONTENTID
AND c.SPACEID = s.SPACEID )
OR ( s.HOMEPAGE IS NULL
AND c.SPACEID = s.SPACEID ) )
AND ( c.PARENTID IS NULL )
AND ( NOT EXISTS(SELECT l.LINKID
FROM LINKS l,
SPACES s2,
CONTENT c2
WHERE (
l.DESTSPACEKEY = s2.SPACEKEY
AND
c.SPACEID = s2.SPACEID )
AND
( Lower(l.DESTPAGETITLE) =
Lower(c.TITLE) )
AND ( Lower(c2.TITLE) !=
Lower(c.TITLE)
AND l.CONTENTID =
c2.CONTENTID )) ) );
Always back up your data before performing any modifications to the database. If possible, test any alter, insert, update, or delete SQL commands on a staging server first.