How to perform a Confluence site search for keywords and links through the database
There are limitations for the default Lucene search algorithm in Confluence.
These are a few examples of the limitation :
- URL links are not searchable: - CONFSERVER-27503Getting issue details... STATUS
- Parts of a string are not searchable:
CONFSERVER-33996Getting issue details...
CONFSERVER-36770Getting issue details...
There are times when you really need to search for the pages that contain the keywords.
For example, a user inserted an external URL into numerous pages, but as time goes by, those URLs might point to a dead link as there might be some changes in the subdomain/URL path.
To search for these contents, run the SQL query below on your Confluence database.
<INSERT_KEYWORD_HERE> with your keyword.
% symbol represents a wildcard search.
The SQL results will return the content ID referred to in the Confluence UI as the page ID, content type, page/blog title, and the Space Key.
The query below uses wildcard searching in order to locate page content that matches a specific pattern.
Due to the nature of wildcard searching and the target tables' possible size, this query may take a long time to execute.
If possible, it is recommended to carry these out in a cloned production database rather than on the production environment itself to avoid any impact on the production environment.
SELECT c.CONTENTID, c.CONTENTTYPE, c.TITLE, s.SPACEKEY FROM CONTENT c JOIN BODYCONTENT bc ON c.CONTENTID = bc.CONTENTID JOIN SPACES s ON c.SPACEID = s.SPACEID WHERE c.PREVVER IS NULL AND c.CONTENTTYPE IN ('PAGE', 'BLOGPOST') AND bc.BODY LIKE '%<INSERT_KEYWORD_HERE>%';
SELECT c.CONTENTID, c.CONTENTTYPE, c.TITLE, s.SPACEKEY FROM dbo.CONTENT c JOIN dbo.BODYCONTENT bc ON c.CONTENTID = bc.CONTENTID JOIN dbo.SPACES s ON c.SPACEID = s.SPACEID WHERE c.PREVVER IS NULL AND c.CONTENTTYPE IN ('PAGE', 'BLOGPOST') AND bc.BODY LIKE '%<INSERT_KEYWORD_HERE>%';