How do I remove sensitive/unwanted content that was pushed to my Bitbucket Data Center instance?

Still need help?

The Atlassian Community is here for you.

Ask the community

Purpose

The purpose of this article is to describe the steps that can be taken to remove sensitive or otherwise unwanted information that has been pushed to a repository hosted in Bitbucket Data Center.

Background

When a sensitive file or line has been pushed to a git repository, such as an SSH key or password, if your team has added additional commits since this content was added - simply deleting the content in the latest commit is not enough, as this information is still going to exist within the commit history for this repository. 

As soon as this sensitive commit has been pushed, your team should treat this data as though it were compromised. Any passwords or SSH keys should be immediately changed, as it's possible that the sensitive information has been already manually copied. In addition, any clones or forks that contain this commit will not be affected by these steps.

What's more, rewriting history and force pushes can lead to undesirable results and unexpected behaviours in Bitbucket Data Center, which is why we generally discourage this practice if you can avoid it at all.

Solution

There are two different methods you can use to remove this sensitive content from your repository's commit history:

Both methods ultimately will end up re-writing the history of the repository to make it as though the sensitive commit was never pushed in the first place.


Using git filter-branch

Running git filter-branch after storing changes using git stash will result in these stashed changes being unretrievable. Any stashed changes should be unstashed prior to running this command.

  1. Clone down the repository to your local git client
  2. Navigate into the repository's directory and execute the following command, being sure to replace 'PATH/TO/SENSITIVE/DATA' with the relative path (inside the clone of the repository) of the entire file you want to remove.

    git filter-branch --force --index-filter "git rm --cached --ignore-unmatch PATH/TO/SENSITIVE/DATA" --prune-empty --tag-name-filter cat -- --all

    NOTE: The slashes in PATH/TO/SENSITIVE/DATA must be "/" rather than "\" when running commands from a Windows client.

  3. (info) Though not strictly necessary, it's recommended you add the sensitive data to the repository's .gitignore file to ensure that it is not accidentally committed again.
  4. After your team has reviewed the state of the local repository, run the following commands to force push the changes back up to Bitbucket to overwrite the repository's existing commit history.

    git push origin --force --all
    git push origin --force --tags
  5. Any users cloning or forking from this repository should be asked to git rebase any branches that contain the old repository history. It is important to rebase and not merge, as merging could result in the sensitive data being re-introduced into the now clean git history of the main repository. 
  6. Lastly - be sure to force all objects in your local repository to be garbage collected using the commands:

    git for-each-ref --format='delete %(refname)' refs/original | git update-ref --stdin
    git reflog expire --expire=now --all
    git gc --prune=now

    (warning) These commands should NOT be executed directly against the repository on the Bitbucket Data Center - only your local copy of the repository. Running git gc against Bitbucket's copy of the repository can result in serious data corruption


If your team has recently merged this unwanted change in a pull request, it is recommended that you contact Atlassian Support for the next steps, as Bitbucket will preserve many git objects on the server related to pull requests. The merged pull requests' diff may still contain sensitive/unwanted information.  The steps that need to be taken to remove the data from these pull requests require manual changes to the external database that will vary depending on the situation. These changes can have adverse effects if they are not performed correctly, and it's for this reason that we encourage teams to reach out to support for additional help along these lines.

Using BFG Repo-Cleaner

BFG Repo-Cleaner is an open-source tool that offers a simpler way of removing unwanted data from your repository's commit history when compared to using the git filter-branch command.

  • (warning) It's important to note that the BFG Repo-Cleaner operates under the assumption that your latest commit is already clean - meaning that it will not perform any changes to the latest commit, but only the commits before it. It's recommended your team push up a new commit that removes the undesired/sensitive information, and that you ensure no code breakages in this clean commit prior to using the BFG Repo-Cleaner.

We recommend consulting the full documentation for the BFG Repo-Cleaner here for a full explanation of what's possible through this tool.

Here is an example of how you can use this tool to help remove data from your Bitbucket repository:

  1. Clone down a local copy of the affected repository using the command git clone --mirror
  2. (info) We recommend backing up a copy of this bare repository prior to executing any changes using the BFG Repo-Cleaner
  3. (info) It's also recommended that you set up an alias bfg as an alias for java -jar bfg.jar after the bfg.jar file has been downloaded and moved to your directory
  4. Using the downloaded bfg.jar, here are some example commands you can run against the repository:

    # Remove any files named 'sensitive_passwords.txt' or 'confidential_passwords.txt' from the repository's commit history
    bfg --delete-files  {sensitive,confidential}_passwords.txt
    # Replace any entries listed in the file 'bobs_credit_cards_and_ssn.txt' with the text ***REMOVED*** wherever they occur in the repository. 
    #     Don't worry, we're also unsure why this was pushed to a git repository.
    bfg --replace-text bobs_credit_cards_and_ssn.txt 

    The full list of BFG Repo-Cleaner commands can be found in the tool's documentation:

  5. (info) Though not strictly necessary, it's recommended you add the sensitive data to the repository's .gitignore file to ensure that it is not accidentally committed again.
  6. After your team has reviewed the state of the local repository, force push the changes back up to Bitbucket to overwrite the repository's existing commit history.

    git push origin --force
  7. Any users cloning or forking from this repository should be asked to git rebase any branches that contain the old repository history. It is important to rebase and not merge, as merging could result in the sensitive data being re-introduced into the now clean git history of the main repository. 

  8. Lastly - be sure to force all objects in your local repository to be garbage collected using the commands:

    git for-each-ref --format='delete %(refname)' refs/original | git update-ref --stdin
    git reflog expire --expire=now --all
    git gc --prune=now

    (warning)These commands should NOT be executed directly against the repository on Bitbucket - only your local copy of the repository. Running git gc against Bitbucket's copy of the repository can result in serious data corruption.

If your team has recently merged this unwanted change in a pull request, it is recommended that you contact Atlassian Support for the next steps, as Bitbucket will preserve many git objects on the server related to pull requests. The merged pull requests' diff may still contain sensitive/unwanted information.  The steps that need to be taken to remove the data from these pull requests require manual changes to the external database that will vary depending on the situation. These changes can have adverse effects if they are not performed correctly, and it's for this reason that we encourage teams to reach out to support for additional help along these lines.

The BFG Repo-Cleaner is a third-party utility and is therefore outside of the Atlassian Support Offerings.  Any issues arising from the usage of this utility will not be supported by Atlassian. 

Last modified on Sep 19, 2024

Was this helpful?

Yes
No
Provide feedback about this article
Powered by Confluence and Scroll Viewport.