Confluence - How to Validate Links

Still need help?

The Atlassian Community is here for you.

Ask the community

Platform notice: Server and Data Center only. This article only applies to Atlassian products on the Server and Data Center platforms.

Support for Server* products ended on February 15th 2024. If you are running a Server product, you can visit the Atlassian Server end of support announcement to review your migration options.

*Except Fisheye and Crucible

Spaces may have multiple pages and pages may have multiple references to external URLs in their content.

Administrators who wish to ensure that these references are not leading their users to broken websites have to go through lots of manual work, since Confluence does not provide an automated way of validating such links.

As there's no way to validate links automatically, administrators need to access page by page and click on link by link, manually fixing non-working entries.

There's a suggestion currently Gathering Interest in this regard - However, as it has not been implemented yet, administrators still need to have a workaround to validate links:

CONFSERVER-55958 - Getting issue details... STATUS

In order to ease administrators life, we can automate some of the work as well as provide ways to gather the needed data that should be validated. Here's a how-to guide! (smile)

  1. The first thing we need to do is to find all external links listed inside all pages of a given space. Luckily enough, we have a table called Links in Confluence database that stores such data, which makes finding the links a matter of running the right query. That being said, execute the query below in your database to get a report with Space Name, Page and Link for each of the space pages:

    SELECT s.spacename as Space, c.title as Page, l.destspacekey as SpaceOrProtocol, l.destpagetitle as Destination
    FROM LINKS l
    JOIN CONTENT c ON c.contentid = l.contentid
    JOIN SPACES s ON s.spaceid = c.spaceid
    WHERE c.prevver IS NULL and s.spacekey = 'YOUR_SPACE_KEY_HERE'
    ORDER BY l.destspacekey
  2. With these results in hands, export them to Excel. Once exported, they should look like this:
  3. Go ahead and insert a new column in between the protocol column and the destination column so it looks like this:
  4. With that done, you should be able to copy the protocol, colon and destination columns from that spreadsheet into a Sublime or Notepad++ (or any other tool you may want to use) text editor. With this done, we will have a list of URLs with spaces:
  5. As URLs are not supposed to have spaces, you can do a Find and Replace to trim spaces until you have actual URLs. Feel free to use any other magic you may have in hands, but, ideally, the output should look like this:

    http://www.thisisbroken.com.br
    http://www.thisisalsobroken.com.br
    https://confluence.atlassian.com/x/ASEC
    https://confluence.atlassian.com/x/ASEC
    https://confluence.atlassian.com/x/ASEC
    https://confluence.atlassian.com/x/ASEC
    https://confluence.atlassian.com/x/ASEC
    https://confluence.atlassian.com/x/ASEC
  6. Now, save this file somewhere and copy the path/name of the file, since we will need it for the next step.
  7. The second part of the exercise now that we have all URLs we need is to test them. Here's a bash script to aid you in doing that - Please note you need to have cURL installed in the computer or server hosting the terminal that is going to be used for testing purposes:

    for WEBLINK in $(cat MY/PATH/links.txt); do
    CURLOUTPUT=$(curl -u admin:admin -I -L $WEBLINK 2>/dev/null | grep ^HTTP)
    echo "$WEBLINK :: $CURLOUTPUT"
    done
  8. Before you run above script, please, replace MY/PATH/links.txt with the path where your text file with links is located inside your system.
  9. Once that's done, execute it.
  10. The output in terminal should look like this:

    http://localhost:8090/display/HS/Test+for+PS-263967 :: HTTP/1.1 404 Not Found
    http://localhost:8090/display/HS/Test+for+PS-26396 :: HTTP/1.1 200 OK
    http://localhost:8090/display/HS/Test+for+PS-263965 :: HTTP/1.1 404 Not Found
    http://www.thisisbroken.com.br :: 
  11. With this in hands, we can identify which URLs are not working. On the example above, URLs which returned something different than 200 OK are broken.
  12. Cross checking results with the URL report taken through the SQL query at the beginning of this tutorial, we will know inside which page the broken URLs are and can then go ahead and fix them the broken ones only, without going through page by page! (smile)


Last modified on Jun 20, 2018

Was this helpful?

Yes
No
Provide feedback about this article
Powered by Confluence and Scroll Viewport.