Fisheye and Crucible: Right to erasure
Introduction
Under Article 17 of the GDPR, individuals have the right to have personal data erased. This is also known as the ‘right to be forgotten’. The right is not absolute and only applies in certain circumstances. Whether or not you are required to honor an individual's request to have personal data deleted will vary on a case-by-case basis, and is determination you should always make with the assistance of legal counsel. Once you have determined you have an obligation to delete personal data, we have provided the following instructions on how to do so within certain Atlassian products.
Personal data stored within the product can be divided into one of two areas: 1) account-level personal data; and 2) free-form text. Account-level personal data are data fields that exist within the product for the sole purpose of identifying an individual throughout the product. Examples of account-level personal data include the user's display name, profile picture or avatar and email address. These data elements are generally visible from the user's profile and are used throughout the product to point back to the user's profile when the user is @mentioned or tagged on in certain spaces or content. Deleting account-level personal data elements will automatically remove those data elements throughout the product where the relevant account-level data elements appear and in the database (subject to some limitations discussed below).
If you have included personal data in free-form text, either typed into content spaces or as a custom field label, you will need to use the product's global search feature to surface this personal data and delete it on a case-by-case basis.
Description
This workaround describes how administrators can anonymize or delete personal data in Fisheye and Crucible.
For more information on where personal data is stored, please read Fisheye and Crucible: Right of access by the data subject.
Version compatibility
All workarounds are compatible with Fisheye and Crucible Server 4.1 and later.
Workaround
How to remove personal data
The removal of a user from the administration panel (Administration > Users > Delete user) does not automatically remove personal data associated with that user. It only marks the user as deleted, so they are no longer visible on the global user list and in search results.
Deleted users can be still found in Fisheye and Crucible as "<Display Name> (deleted user)" in the context of various activities they previously participated in. For this reason, please follow the steps below to permanently remove a user's personal data.
Step 1. Remove from user directories
Go to Administration > User Directories. Next steps will depend on whether you use an internal and/or external directory.
Directory | Steps to take |
---|---|
Internal user directory (Internal or Internal with LDAP authentication) | You can modify personal data in Fisheye and Crucible directly.
|
External user directory (Microsoft Active Directory, LDAP, Crowd or Jira) |
|
Multiple user directories | The same user may be defined in several user directories. In such cases, the order of directories is taken into account. Please ensure that you have removed the user from all directories. Disabling and/or changing the order of directories may help you idenitfy duplicates. |
Some database tables will still contain rows with a deleted user (but with the new user name). This is why we suggest that you first rename the user with an anonymized name. This data is kept for integrity and reference-ability purposes - for example, any comment posted in a review by a deleted user will now be shown as "<anonymised display name> (deleted user)".
To learn more, please read the following:
Step 2. Remove from SQL database
Deleting a user from Fisheye and Crucible is not possible without risking data integrity (a user is associated with code reviews, comments). For this reason, we suggest you anonymize personal data related to the user being deleted, rather than remove the data elements altogether.
User directory data
User directory tables have the cwd_ prefix. They should already be deleted or anonymized if you followed 'step 1'. Use the following query to check:
Fisheye and Crucible data
Fisheye and Crucible tables have the cru_ prefix. They should already be deleted or anonymized, if you followed 'step 1'. Use the following query to check:
The cru_revision.cru_author_name column contains the author of the commit. This value is not being deleted, in order to remain consistent with the repository content, see the 'Limitations' section.
Step 3: Remove from search indexes
After you delete information from the SQL database, you have to refresh content of Lucene search indexes, to remove cached data.
Global cross-repository search index
This index keeps metadata information (like commit messages, authors, paths etc) of all repositories indexed by Fisheye, and is being used by the cross-repository search feature. The clean up of this index is out of scope, because this index reflects information from source code repositories and we assume that repository history is fixed.
Global Crucible search index
This index keeps information from code reviews, such as participants or review comments. Go to Administration > Crucible > Crucible Index Maintenance and select Re-index.
Step 4. Remove avatars
Go to Administration > Avatars and check which service is used for server avatars.
Where an 'internal' option is used, Fisheye and Crucible stores uploaded files on disk. Run the script below to locate the avatar file in the $FISHEYE_INST directory:
username=<put username>
hash=`md5 -q -s $username`
echo "/var/data/avatars/`echo $hash|cut -c1-2`/$hash.image"
Where an 'external' option is used, remove the avatar from the external system.
Step 5: Remove from unstructured data
In addition to structured data, personal data may appear in various places in Fisheye and Crucible due to a user's activity. Examples include:
Personal data in code repositories
Users may commit personal data into repositories as commit comments, in text files, in binary files (for example, photos or documents). Fisheye's search feature can help you find these in commit messages, diffs, file content and commit authors.
Use the "Search" box from the blue application header. Keep in mind that search results may not return 100% of occurrences. Fisheye does not index large files (over 100kb-5MB, depending on repository settings), it will also not index binary files (some text files may be incorrectly marked as binary in the repository).
Personal data code in code reviews
Users may put personal data in code reviews. Crucible's search feature can help you find instances in reviews (review description, review comments).
Use the "Search" box from the blue application header. Please keep in mind, content added to a review (attachments, patches) is not being indexed. You can search content of uploaded files in $FISHEYE_INST/var/data/uploads directory.
Personal data in search queries
Users may create search queries and save them as a favorite. Search content of text columns in cru_base_star_model table for any personal data.
Step 6: Remove from external systems
Fisheye and Crucible may send data (including personal data) to connected external systems. For example, if a Webhook is configured, Fisheye will send commit information including commit author.
Limitations
There are places within Fisheye and Crucible where personal data is not removed.
Repository caches
Fisheye and Crucible process data fetched from the connected repositories. Currently, five different SCM systems are supported:
- Git
- Mercurial
- SVN
- Perforce
- CVS
Data obtained from the repository may contain personal data. Certain data differs from the repository type, but it's usually the user name and/or email of the user who made a code change. This data will be not removed from Fisheye and Crucible.
Why this data is not removed
Source code history stored in a repository can be treated as an audit log. In many cases, information about the when, why and who introduced a code change, is critical.
For example, finding the person responsible for introducing a bug or avoiding intellectual property contamination. This may be applicable to commercial, but also to free and open source code. It's not uncommon for open source projects to require the signing of a 'contributor agreement' to verify contributions.
As Fisheye and Crucible processes data from external sources, data stored in Fisheye and Crucible has to be in sync with data from that source. For this reason, Fisheye does not manipulate the history of the repository being scanned.
How to avoid storing personal data in repositories
If would like to avoid storing personal data in source code history, there are a few steps required to achieve this.
Configure committer credentials
Use these identifiers for user name and email addresses, so that they will not contain personal data (for example, numeric id ). Exact configuration depends on the repository type, please refer to the technical documentation of these products to learn how to achieve this. We listed a few links for your convenience:
- Git - setting your username and setting email address
- Mercurial - set user name
- Subversion - user manual
- Perforce - documentation
- CVS - user manual
Configure committer mappings
Fisheye and Crucible come with a feature that allows to you set up mappings between authors of commits in a source code repository, and users of Fisheye and Crucible. You can map credentials set up in a previous step with real users. The mapping is active as long as the user is present. When users are removed from Fisheye and Crucible, the mapping disappears and you will see the original credentials used by committers. You don't have to delete the repository or rewrite the repository history, in order to remove personal data.
Open Administration > User Mappings to set this up. End users can also open Profile Settings > Author Mapping and set them.
How to remove data
Your company may decide to remove personal data from source code repositories. Since data in Fisheye and Crucible is provided from an external source control system, it has to be erased from that system first. Below are two ways to delete data from Fisheye and Crucible, if necessary.
Remove the repository
Removing the repository from Fisheye and Crucible means also removing all related data - not only personal data. More details on how to delete repository can be found in Managing your repositories.
Rewrite repository history
Some repository types allow you to change stored information, while others do not. Please refer to the product's technical documentation to understand how to do this.
For example, Subversion contains a set of unversioned properties, one of them is 'svn:author' property, which means that you can update these commits without rewriting the entire repository history.
Other examples are Git and Mercurial, they calculate hashes using all commit metadata. As a consequence, modification of a commit author means rewriting the entire repository history, resulting in different commit hashes. Effectively it will create a new repository, breaking any existing references in Fisheye and Crucible to these commits.
Rewriting repository may create completely different content
There are consequences of a repository history rewrite.
For example, code reviews in Crucible will no longer be linked to appropriate commits (as commit hashes are different), making correct rendering unachievable (unless the 'Store the contents of files in reviews' feature was used).
Re-clone and re-index repository
In order to refresh data in Fisheye and Crucible you have to either:
- create a new repository pointing to the rewritten repository, and delete the old repository
- refresh the content of the existing repository via:
- re-clone and re-index (Git, Mercurial)
- re-index (Perforce, CVS)
- re-scan subversion non-versioned properties (Subversion)
Choose Administration > Repositories > (edit repository) > Maintenance. For more information on how to re-index a repository, read Re-indexing your repository.
Reindexing may be very time-consuming operation
The re-indexing of repository history can be costly. For large repositories (10,000+ commits and branches) it may take weeks or months. Perform an analysis of your repositories and re-indexing speed (on a test machine) prior to executing it on a production instance. Please read How do I avoid long reindex times for further guidance.
Application logs
Application logs may contain personal data (for example, a user name or IP address in access logs). We recommend defining a company policy for how long log files will be stored.
Alternatively, you may strip personal data from logs (using 'grep -v').
Backup files
Currently, there is no technical way to remove personal data from backup files. Removal from backups would mean:
- setting up a new Fisheye and Crucible instance
- importing all data from a backup
- removing personal data
- creating the backup of the instance
- repeating this for each backup file
We recommend disposing of backup files automatically after a set maximum period of time when they are no longer necessary.
In rare cases, when there's a need to restore a system from a backup, a user's personal data may be restored as a result of this action.
Additional notes
There may be limitations based on your product version.
Note, the above-related GDPR workaround has been optimized for the latest version of this product. If you are running on a legacy version of the product, the efficacy of the workaround may be limited. Please consider upgrading to the latest product version to optimize the workarounds available under this article.
Third-party add-ons may store personal data in their own database tables or on the filesystem.
The above article in support of your GDPR compliance efforts applies only to personal data stored within the Atlassian server and data center products. To the extent you have installed third-party add-ons within your server or data center environment, you will need to contact that third-party add-on provider to understand what personal data from your server or data center environment they may access, transfer or otherwise process and how they will support your GDPR compliance efforts.
If you are a server or data center customer, Atlassian does not access, store, or otherwise process the personal data you choose to store within the products. For information about personal data Atlassian processes, see our Privacy Policy.