Example Robots.txt Files
Choose the robots.txt file most appropriate to your situation:
1. To prevent indexing of the entire server use:
User-agent: * Disallow: /
2. To only prevent indexing of Confluence and JIRA while still indexing other applications deployed in your application server:
#This needs to be at /robots.txt # tomcat: put it in the webapps/ROOT # apache and tomcat integrated: put it it root pages directory Disallow: /jira/ Disallow: /confluence/pages Disallow: /confluence/spaces Disallow: /confluence/dashboard.action Disallow: /confluence/adminstrators.action Disallow: /confluence/searchsite.action Disallow: /confluence/display/\[space to disalllow] # ignore user pages Disallow: /confluence/display/~
Instructions For Standalone Editions On Their Own Domain
- Open your Confluence install directory and save the appropriate file as \confluence\robots.txt. For example c:\confluence-2.5.4-std\confluence\robots.txt
- Visit your Confluence domain and confirm that the robots.txt is now accessible in the root directory. For example, if your domain is http://confluence.atlassian.com then the file should be accessible at http://confluence.atlassian.com/robots.txt
General Instructions
Save the appropriate file as robots.txt in the application server root directory. For Apache Tomcat, place it under ../webapps/ROOT.

Comments (10)
Jul 01, 2007
Peter R. says:
Here's one were using to try to get our Google Enterprise Search Appliance to st...Here's one were using to try to get our Google Enterprise Search Appliance to stop using up the majority of our Confluence server resources. Feedback welcome:
Jun 25, 2007
Charles Miller says:
I'm surprised all of this is necessary. Most of the things you list are already ...I'm surprised all of this is necessary. Most of the things you list are already protected by robots META headers in the application itself.
Is your search appliance configured to respect rel="nofollow" on links?
Jun 25, 2007
Peter R. says:
In CSP-8619 I'm being told that this is a good robots.txt to help stop our GSA f...In CSP-8619 I'm being told that this is a good robots.txt to help stop our GSA from impacting our performance. At no point has anyone said that there are robots META headers in the application.
Yes, our search appliance respects nofollow on links. Even so, it's churned through almost 80,000 URLs in our instance and we've only got ~7500 current version pages. Since Confluence generates pages dynamically with no cache, GSA will circle back and hit the same page over and over. Before we implemented robots.txt it was chewing up 85% of the bandwidth of the site! Over 250 requests a second at one point, making it our prime suspect in our performance and application hang issues.
Matt Ryall even suggested opening a feature request, which I did at CONF-8749.
If you can see something we're all missing I'll be forever in your debt! We're losing users back to MediaWiki and other "unapproved" applications because of this issue.
Peter
Jun 28, 2007
Peter R. says:
I was hoping to hear back on this...I was hoping to hear back on this...
Jun 28, 2007
Guy Fraser says:
It might be worth posting a message on the conf-user and conf-dev mailing lists ...It might be worth posting a message on the conf-user and conf-dev mailing lists telling people about this page to get more people looking at it...
Jul 01, 2007
Peter R. says:
I was actually in the middle of doing so when Charles commented on the page. Giv...I was actually in the middle of doing so when Charles commented on the page. Given his credentials, I wanted to see what he had to say before linking to it from the forums / mailing list. I'll give him another day or so here and then link.
Jul 01, 2007
Charles Miller says:
Well, looking at this page, for example: The printable and pdf export links a...Well, looking at this page, for example:
<meta name="robots" content="noindex,nofollow"> <meta name="robots" content="noarchive">It would be really useful if someone with the Google search appliance could:
Jul 01, 2007
Peter R. says:
I can verify that our PROD GSA has ~102,000 URLs from the Confluence system with...I can verify that our PROD GSA has ~102,000 URLs from the Confluence system with "decorator=printable" in the URL. When I visit one of the URLs and look at the page source I do NOT see the noarchive attribute:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd"> <!-- main.vmd themebuilder : 'com.adaptavist.confluence.sitebuilder.SiteBuilderVelocityHelper@27478ea9'/'$themebuilder.initialise' spaceKey : 'QuickStart' pageId : '766' currentURL : '/pages/viewpage.action?spaceKey=QuickStart&title=Plug-in+Requests&decorator=printable' contextPath : '' spaceName : 'Wiki Quick Start' decorator : 'printable' printable : 'true' mailId : '$mailId' mode : 'view' context : 'page' --> <html> <head> <title>Page Name Removed for Privacy</title> <META HTTP-EQUIV="Pragma" CONTENT="no-cache"> <META HTTP-EQUIV="Expires" CONTENT="-1"> <script type="text/javascript" language="JavaScript">var contextPath = '';</script> <script language="javascript"> var contextPath = ''; var i18n = []; </script> <link rel="shortcut icon" href="/images/icons/favicon.ico"> <link rel="icon" type="image/png" href="/images/icons/favicon.png"> <script type="text/javascript" language="JavaScript" src="/decorators/effects.js"></script> <script type="text/javascript" language="JavaScript" src="/download/resources/com.adaptavist.confluence.themes.sitebuilder:sitebuilder/icons/visibility.js"></script> <style type="text/css"> .breadcrumb2 {display:none;visibility:false;} </style> <link rel="stylesheet" type="text/css" href="/plugins/sitebuilder/sitebuilder-resources.action?spaceKey=QuickStart&resource=panelcss&hash=1121034283"/> <link rel="stylesheet" type="text/css" href="/plugins/sitebuilder/sitebuilder-resources.action?spaceKey=QuickStart&resource=css&hash=1121034283"/> <!--[if gte IE 5.5000]> <link rel="stylesheet" type="text/css" href="/plugins/sitebuilder/sitebuilder-resources.action?spaceKey=QuickStart&resource=iecss&hash=1121034283"/> <![endif]--> <!--[if gte IE 5.5000]> <script type="text/javascript" language="JavaScript" language="JavaScript" src="/download/resources/com.adaptavist.confluence.themes.sitebuilder:sitebuilder/icons/PieNG.js"></script> <![endif]--> <script type="text/javascript" src="/download/resources/com.adaptavist.confluence.themes.sitebuilder:sitebuilder/general/browser_detect.js"></script> </head>Given that, that's why the GSA is indexing it...
Jul 01, 2007
Peter R. says:
I took a look at an Info page for one of our pages and verified that the PDF ver...I took a look at an Info page for one of our pages and verified that the PDF version does indeed have the "nofollow" attribute.
I also see in the header:
<meta name="robots" content="noindex,nofollow"> <meta name="robots" content="noarchive">So that's telling me that the info page itself isn't being indexed, correct? Shouldn't it also mean that no links on the actual page should be followed? And also make the nofollow attribute on the URLs for PDF, Copy and Export to Word redundant?
Due to the distributed/outsourced nature of our support, it's going to be difficult for me to get configuration info from the GSA side. However, I can easily do an advanced search through the interface to see what pages it has indexed, which is what I did with the decorator=printable above.
Given that, how can I help you help me/us on this?
Thank you.
Peter
Aug 08, 2007
Guy Fraser says:
Peter - Theme Builder 3.0 beta 4 should solve that problem: http://jira.adaptav...Peter - Theme Builder 3.0 beta 4 should solve that problem:
http://jira.adaptavist.com/browse/BUILDER-662
I also added this feature request (might not make it in to 3.0 release):
http://jira.adaptavist.com/browse/BUILDER-591