What to do if you've committed sensitive data to a Bitbucket Cloud repository

Still need help?

The Atlassian Community is here for you.

Ask the community

Important: 

Before we begin, it's best to assume that any sensitive data pushed to your remote repository has already been leaked. Users could have viewed the data on the Bitbucket website, pulled it to their local repositories, or even have it in a new forked repository.

You should immediately change all credentials and keys that may have been compromised.

You can choose to retrospectively remove the sensitive data from your Bitbucket repository, but beware that it is a destructive process that rewrites the repository’s git history and one you should try your best to avoid. If you still want to do so, read below.

How to Remove Data from a Bitbucket Cloud Repository

Git Background

Git commits are immutable objects. The ID of a commit object is the SHA-1 hash of various fields, including the commit content and previous commit’s ID. Because of the former, you cannot simply change the content of an existing commit but only replace it with a new commit object and different SHA-1 hash. Because of the latter, even if you replace a commit object, you must replace every following commit as well. To remove data from a repository, you may need to effectively rewrite a chunk of the repository’s git history. For resources and more details about Git commit objects, read What is a Git commit ID? or Git - Git Objects.

BFG Repo-Cleaner

BFG Repo-Cleaner is a trusted open-source software that can efficiently remove data from a repository. We will show you how to use it in this article. To summarize, BFG sequentially creates new commits which copy the content of existing commits but without the sensitive data you want to remove. You can then remove all references to the old commits and delete them from memory. Bear in mind, while non-sensitive commit content can be duplicated, the SHA-1 commit hashes will be different and therefore, references to existing commits may break.

Alternatives

Git-filter-repo is another open-source tool that can be used to remove data from a repository. While it has more features than BFG Repo-Cleaner, they may not be necessary.

Git-filter-branch is the native Git command for this scenario, however it is risky, complicated, and not recommended in general.

Steps

1. (Recommended) Merge or Decline Open Pull Requests

Removing data from a repository may break existing pull requests, so merge or decline all open ones before starting.


2. Install BFG Repo-Cleaner from the website (download the JAR file) or with Homebrew

brew install bfg

3. (Optional) Create a secure and stable latest commit

To mitigate the risk of breaking production code (e.g., continuous deploys), BFG by default will not change your latest (HEAD) commit. It will clean every relevant commit prior. If you want to follow this precaution and have not already, manually remove any sensitive data and create a secure last commit. If you are not worried about this, you can skip this step and ask BFG to clean your latest commit as well.

git rm [sensitive_file]

git commit -m "[commit_message]"


4. Run the appropriate BFG command to clean your repository
* If you want BFG to clean your latest commit, insert --no-blob-protection in the following commands

To delete a specific file:

bfg --delete-files [file_name]

To replace specific strings, create a text file with each line being a string you want to remove:

bfg --replace-text [textfile_name]

To delete a specific commit (provide commit ID not hash):

bfg --strip-blobs-with-ids [commit_id]


5. Force push to your remote

* Contact your repository admin if you do not have access to force push

At this point, BFG has removed branch and tag references to the sensitive commits in your local repository and re-written the commit history. To modify the remote Bitbucket repository commit history, you must force push the changes.

git push --force


6. Raise a Bitbucket Support Case to delete commits on your Bitbucket Cloud Repository

Despite having removed branch references, the unwanted commit objects still likely exist on both local and remote repositories. This is because they may be referenced by the reflog or (for the remote case) internal Bitbucket objects like pull requests. Raise a support case and our team will remove these references and trigger a garbage collection in the remote repository.

  • Note that support will have to delete pull requests in which the source branch (or ancestor of the source branch) contains sensitive data. If you would like to preserve non-sensitive details of these pull requests (descriptions, comments, approvals, etc), please manually back them up.


7. Rebase existing branches

All branches created with the old git history (prior to our BFG cleanup and force push), should be rebased with the main branch to adopt the new secure git history and remove traces of the old one. Do not merge any open pull requests before this, as that will mix the old and new git histories and corrupt the repository.


8. Delete local git references and commits

After you have verified that the remote repository is secure and stable, execute the following commands to remove references to the unwanted commits and delete them in your local repository.

First, remove any unreachable refs:

git for-each-ref --format="delete %(refname)" refs/original | git update-ref --stdin

Second, remove any unreachable reflog entries:

git reflog expire --expire=now --all

Finally, run garbage collection to actually delete the old unwanted commit objects:

git gc --aggressive --prune=now

Bitbucket Views

Once you have followed this process and the Bitbucket Cloud support team has informed you they have finished cleaning up the remote repository, all views (pull requests, branch page, commits, etc) that showed removed data should be non-existent.


Last modified on Oct 5, 2022

Was this helpful?

Yes
No
Provide feedback about this article
Powered by Confluence and Scroll Viewport.