Confluence - How to Validate Links
This Knowledge Base article was written specifically for the Atlassian Server platform. Due to the Restricted functions in Atlassian Cloud apps, the contents of this article cannot be applied to Atlassian Cloud applications.
Spaces may have multiple pages and pages may have multiple references to external URLs in their content.
Administrators who wish to ensure that these references are not leading their users to broken websites have to go through lots of manual work, since Confluence does not provide an automated way of validating such links.
As there's no way to validate links automatically, administrators need to access page by page and click on link by link, manually fixing non-working entries.
There's a suggestion currently Gathering Interest in this regard - However, as it has not been implemented yet, administrators still need to have a workaround to validate links:
In order to ease administrators life, we can automate some of the work as well as provide ways to gather the needed data that should be validated. Here's a how-to guide!
The first thing we need to do is to find all external links listed inside all pages of a given space. Luckily enough, we have a table called
Linksin Confluence database that stores such data, which makes finding the links a matter of running the right query. That being said, execute the query below in your database to get a report with Space Name, Page and Link for each of the space pages:
SELECT s.spacename as Space, c.title as Page, l.destspacekey as SpaceOrProtocol, l.destpagetitle as Destination FROM LINKS l JOIN CONTENT c ON c.contentid = l.contentid JOIN SPACES s ON s.spaceid = c.spaceid WHERE c.prevver IS NULL and s.spacekey = 'YOUR_SPACE_KEY_HERE' ORDER BY l.destspacekey
- With these results in hands, export them to Excel. Once exported, they should look like this:
- Go ahead and insert a new column in between the
protocolcolumn and the
destinationcolumn so it looks like this:
- With that done, you should be able to copy the
destinationcolumns from that spreadsheet into a Sublime or Notepad++ (or any other tool you may want to use) text editor. With this done, we will have a list of URLs with spaces:
As URLs are not supposed to have spaces, you can do a Find and Replace to trim spaces until you have actual URLs. Feel free to use any other magic you may have in hands, but, ideally, the output should look like this:
http://www.thisisbroken.com.br http://www.thisisalsobroken.com.br https://confluence.atlassian.com/x/ASEC https://confluence.atlassian.com/x/ASEC https://confluence.atlassian.com/x/ASEC https://confluence.atlassian.com/x/ASEC https://confluence.atlassian.com/x/ASEC https://confluence.atlassian.com/x/ASEC
- Now, save this file somewhere and copy the path/name of the file, since we will need it for the next step.
The second part of the exercise now that we have all URLs we need is to test them. Here's a bash script to aid you in doing that - Please note you need to have cURL installed in the computer or server hosting the terminal that is going to be used for testing purposes:
for WEBLINK in $(cat MY/PATH/links.txt); do CURLOUTPUT=$(curl -u admin:admin -I -L $WEBLINK 2>/dev/null | grep ^HTTP) echo "$WEBLINK :: $CURLOUTPUT" done
- Before you run above script, please, replace
MY/PATH/links.txtwith the path where your text file with links is located inside your system.
- Once that's done, execute it.
The output in terminal should look like this:
http://localhost:8090/display/HS/Test+for+PS-263967 :: HTTP/1.1 404 Not Found http://localhost:8090/display/HS/Test+for+PS-26396 :: HTTP/1.1 200 OK http://localhost:8090/display/HS/Test+for+PS-263965 :: HTTP/1.1 404 Not Found http://www.thisisbroken.com.br ::
- With this in hands, we can identify which URLs are not working. On the example above, URLs which returned something different than 200 OK are broken.
- Cross checking results with the URL report taken through the SQL query at the beginning of this tutorial, we will know inside which page the broken URLs are and can then go ahead and fix them the broken ones only, without going through page by page!