Platform Notice: Data Center - This article applies to Atlassian products on the Data Center platform.

Note that this knowledge base article was created for the Data Center version of the product. Data Center knowledge base articles for non-Data Center-specific features may also work for Server versions of the product, however they have not been tested. Support for Server* products ended on February 15th 2024. If you are running a Server product, you can visit the Atlassian Server end of support announcement to review your migration options.

*Except Fisheye and Crucible

Summary

Publically available Bitbucket site gets displayed in search results when we do browser searches (that is, Google, Bing, etc.) with search strings like "<company_name> bitbucket" or "bitbucket <company_name>". Web crawlers index the Bitbucket site and add it to the search index.

Sometimes, It is undesirable to get Bitbucket site as part of the search and expose details (such as Bitbucket version, etc.), and this article provides solutions to this problem.

Environment

Applicable for all Bitbucket Data Center versions.
Publically available Bitbucket site.

Cause

By default, the built-in robots.txt response is empty, which allows the instance to be crawled. robots.txt may be accessed anonymously as that is how it can direct web crawler based on preferred configurations.

Solution

Bitbucket server 5.11 introduces the ability to configure robots.txt.

Administrators can create and place their robots.txt in $BITBUCKET_HOME/shared. Adding the file to the shared home ensures it is preserved across upgrades, and all cluster nodes for Data Center installations return the same response.

Click here to expand...

For reference, robots.txt(content) :

User-agent: *
Disallow: /

The “User-agent: *” part means that, it applies to all robots. The “Disallow: /” part means that it applies to your entire website.

This robots.txt file will tell all robots and web crawlers that they can't access or crawl your site.

Configurations mentioned in robot.txt will determine Allow and Disallow conditions for various User-agents (for example, Googlebot, Bingbot, Yandex Bot, Apple Bot, etc.)

There are multiple ways in which Allow, Disallow and User-agent can be configured to achieve different outcomes based on user's need. How to configure robot.txt - can help to dive deep into options.

Please note: There is no ability to configure and serve the robots.txt file for mirrors. BSERV-14273 - Provide Mirrors the ability to serve the robots.txt file

Bitbucket Support

Knowledge base

Products

Jira Software

Jira Service Management

Jira Work Management

Confluence

Bitbucket

Resources

Documentation

Community

Suggestions and bugs

Marketplace

Billing and licensing

How to Prevent Google Web Crawlers from Indexing Bitbucket

Still need help?

Summary

Environment

Cause

Solution

Page

Viewport

Confluence

How to Prevent Google Web Crawlers from Indexing Bitbucket

Related content

Still need help?

Summary

Environment

Cause

Solution

Related content