How to Prevent Google Web Crawlers from Indexing Bitbucket
Platform Notice: Data Center - This article applies to Atlassian products on the Data Center platform.
Note that this knowledge base article was created for the Data Center version of the product. Data Center knowledge base articles for non-Data Center-specific features may also work for Server versions of the product, however they have not been tested. Support for Server* products ended on February 15th 2024. If you are running a Server product, you can visit the Atlassian Server end of support announcement to review your migration options.
*Except Fisheye and Crucible
Summary
Publically available Bitbucket site gets displayed in search results when we do browser searches (that is, Google, Bing, etc.) with search strings like "<company_name> bitbucket
" or "bitbucket <company_name>
". Web crawlers index the Bitbucket site and add it to the search index.
Sometimes, It is undesirable to get Bitbucket site as part of the search and expose details (such as Bitbucket version, etc.), and this article provides solutions to this problem.
Environment
- Applicable for all Bitbucket Data Center versions.
- Publically available Bitbucket site.
Cause
By default, the built-in robots.txt
response is empty, which allows the instance to be crawled. robots.txt
may be accessed anonymously as that is how it can direct web crawler based on preferred configurations.
Solution
Bitbucket server 5.11 introduces the ability to configure robots.txt.
Administrators can create and place their robots.txt in $BITBUCKET_HOME/shared
. Adding the file to the shared home ensures it is preserved across upgrades, and all cluster nodes for Data Center installations return the same response.
Configurations mentioned in robot.txt will determine Allow
and Disallow
conditions for various User-agents
(for example, Googlebot, Bingbot, Yandex Bot, Apple Bot, etc.)
There are multiple ways in which Allow
, Disallow
and User-agent
can be configured to achieve different outcomes based on user's need. How to configure robot.txt - can help to dive deep into options.
Please note: There is no ability to configure and serve the robots.txt file for mirrors. BSERV-14273 - Provide Mirrors the ability to serve the robots.txt file |
---|