How to Prevent Google Web Crawlers from Indexing Bitbucket

Platform Notice: Data Center - This article applies to Atlassian products on the Data Center platform.

Note that this knowledge base article was created for the Data Center version of the product. Data Center knowledge base articles for non-Data Center-specific features may also work for Server versions of the product, however they have not been tested. Support for Server* products ended on February 15th 2024. If you are running a Server product, you can visit the Atlassian Server end of support announcement to review your migration options.

*Except Fisheye and Crucible

Summary

Publically available  Bitbucket site gets displayed in search results when we do browser searches (that is, Google, Bing, etc.) with search strings like "<company_name> bitbucket" or "bitbucket <company_name>". Web crawlers index the Bitbucket site and add it to the search index. 

Sometimes, It is undesirable to get Bitbucket site as part of the search and expose details (such as Bitbucket version, etc.), and this article provides solutions to this problem. 

Environment

  • Applicable for all Bitbucket Data Center versions.
  • Publically available Bitbucket site.

Cause

By default, the built-in robots.txt response is empty, which allows the instance to be crawled. robots.txt may be accessed anonymously as that is how it can direct web crawler based on preferred configurations.

Solution

Bitbucket server 5.11 introduces the ability to configure  robots.txt.

Administrators can create and place their robots.txt in $BITBUCKET_HOME/shared. Adding the file to the shared home ensures it is preserved across upgrades, and all cluster nodes for Data Center installations return the same response.


Click here to expand...

For reference, robots.txt(content) :

User-agent: *
Disallow: /

The “User-agent: *” part means that, it applies to all robots. The “Disallow: /” part means that it applies to your entire website.

This robots.txt file will tell all robots and web crawlers that they can't access or crawl your site.

Configurations mentioned in robot.txt will determine Allow and Disallow conditions for various User-agents (for example, Googlebot, Bingbot, Yandex Bot, Apple Bot, etc.) 

There are multiple ways in which AllowDisallow and User-agent can be configured to achieve different outcomes based on user's need.  How to configure robot.txt - can help to dive deep into options.


Please note: There is no ability to configure and serve the robots.txt file for mirrors. 

BSERV-14273 - Provide Mirrors the ability to serve the robots.txt file

Last modified on Feb 10, 2025

Was this helpful?

Yes
No
Provide feedback about this article
Powered by Confluence and Scroll Viewport.