Setting Up Confluence to Index External Sites
Confluence cannot easily index external sites, due to the way Lucene search works in Confluence, but there are two alternatives:
Embedding external pages into Confluence
If you only have a small number of external sites to index, you may prefer to enable the HTML-include Macro and use it embed the external content inside normal Confluence pages.
The actual content of the external site won't be indexed.
Replacing the Confluence search
Use your own programmer resources to replace Confluence's internal search with a crawler that indexes both Confluence and external sites. This advanced option is easier than modifying the internal search engine. It requires removing Confluence internal search from all pages and replacing the internal results page with your own crawler front-end.
- Setup a replacement federated search engine to index the Confluence site, as well as your other sites, and provide the results that way. You would need to host a web crawler, such as these open-source crawlers. Note that you can perform a search in Confluence via the Confluence API.
- Replace references to the internal search by modifying the site layout so that it links to your search front-end
- Host another site containing the search front-end. You may wish to insert it into a suitable context path in your application server so that it appears to be from a path under Confluence. Tomcat sets Confluence's paths from the Confluence install\confluence\WEBINF\web.xml file.