Confluence - How to Validate Links
Platform notice: Server and Data Center only. This article only applies to Atlassian products on the Server and Data Center platforms.
Support for Server* products ended on February 15th 2024. If you are running a Server product, you can visit the Atlassian Server end of support announcement to review your migration options.
*Except Fisheye and Crucible
Problem
Spaces may have multiple pages and pages may have multiple references to external URLs in their content.
Administrators who wish to ensure that these references are not leading their users to broken websites have to go through lots of manual work, since Confluence does not provide an automated way of validating such links.
Diagnosis
As there's no way to validate links automatically, administrators need to access page by page and click on link by link, manually fixing non-working entries.
Suggestion
There's a suggestion currently Gathering Interest in this regard - However, as it has not been implemented yet, administrators still need to have a workaround to validate links:
- CONFSERVER-55958Getting issue details... STATUS
Workaround
In order to ease administrators life, we can automate some of the work as well as provide ways to gather the needed data that should be validated. Here's a how-to guide!
The first thing we need to do is to find all external links listed inside all pages of a given space. Luckily enough, we have a table called
Links
in Confluence database that stores such data, which makes finding the links a matter of running the right query. That being said, execute the query below in your database to get a report with Space Name, Page and Link for each of the space pages:SELECT s.spacename as Space, c.title as Page, l.destspacekey as SpaceOrProtocol, l.destpagetitle as Destination FROM LINKS l JOIN CONTENT c ON c.contentid = l.contentid JOIN SPACES s ON s.spaceid = c.spaceid WHERE c.prevver IS NULL and s.spacekey = 'YOUR_SPACE_KEY_HERE' ORDER BY l.destspacekey
- With these results in hands, export them to Excel. Once exported, they should look like this:
- Go ahead and insert a new column in between the
protocol
column and thedestination
column so it looks like this: - With that done, you should be able to copy the
protocol
,colon
anddestination
columns from that spreadsheet into a Sublime or Notepad++ (or any other tool you may want to use) text editor. With this done, we will have a list of URLs with spaces: As URLs are not supposed to have spaces, you can do a Find and Replace to trim spaces until you have actual URLs. Feel free to use any other magic you may have in hands, but, ideally, the output should look like this:
http://www.thisisbroken.com.br http://www.thisisalsobroken.com.br https://confluence.atlassian.com/x/ASEC https://confluence.atlassian.com/x/ASEC https://confluence.atlassian.com/x/ASEC https://confluence.atlassian.com/x/ASEC https://confluence.atlassian.com/x/ASEC https://confluence.atlassian.com/x/ASEC
- Now, save this file somewhere and copy the path/name of the file, since we will need it for the next step.
The second part of the exercise now that we have all URLs we need is to test them. Here's a bash script to aid you in doing that - Please note you need to have cURL installed in the computer or server hosting the terminal that is going to be used for testing purposes; please remember to replace the username and password values with a valid account and its corresponding password:
for WEBLINK in $(cat MY/PATH/links.txt); do CURLOUTPUT=$(curl -u username:password -I -L $WEBLINK 2>/dev/null | grep ^HTTP) echo "$WEBLINK :: $CURLOUTPUT" done
- Before you run above script, please, replace
MY/PATH/links.txt
with the path where your text file with links is located inside your system. - Once that's done, execute it.
The output in terminal should look like this:
http://localhost:8090/display/HS/Test+for+PS-263967 :: HTTP/1.1 404 Not Found http://localhost:8090/display/HS/Test+for+PS-26396 :: HTTP/1.1 200 OK http://localhost:8090/display/HS/Test+for+PS-263965 :: HTTP/1.1 404 Not Found http://www.thisisbroken.com.br ::
- With this in hands, we can identify which URLs are not working. On the example above, URLs which returned something different than 200 OK are broken.
- Cross checking results with the URL report taken through the SQL query at the beginning of this tutorial, we will know inside which page the broken URLs are and can then go ahead and fix them the broken ones only, without going through page by page!