_confluence_right_to_erasure
Version Compatibility
Confluence 5.10 and higher.
Description
Personal data for a specific user can be spread across multiple components of Confluence.
User account-level personal data
We've documented all areas of a user's account-level personal data on Confluence: Right of access by the data subject. The account-level personal data may be used inside pages via a Confluence macro, can be searched, and is part of the user mentions feature.
Removing the user account-level personal data will prevent the user's account-level personal data (including avatar, display name, and any profile information) from being searched, the user's name will no longer be shown as an author of content, and the user's name will not be available to be mentioned.
How you remove account-level personal data depends on your Confluence version:
- In Confluence 6.13 and later, account-level personal data is automatically removed when you delete the user account. Learn how to delete a user account.
- In Confluence 6.12 and earlier, account-level personal data can be manually removed using the 'user account-level personal data' SQL workaround described below.
Filesystem
Search Index
If the search platform is Lucene, the search indexes are saved as files in the Confluence home directory under /index
. In the case of OpenSearch as the search platform, the search indexes are stored within OpenSearch.
These indexes contain information, such as the display name, email address, and username, which are also stored in the database. They are essential for search features.
In Confluence 6.13 and later, after a user account is deleted, all content the user had created, contributed to, or been @mentioned on is automatically reindexed.
In Confluence 6.12 and earlier, the 'user account-level personal data' SQL workaround will trigger an update to the index, which will remove the user's personal data from the index. The administrator may also opt to rebuild the index by following the instructions at How to Rebuild the Content Indexes From Scratch on Confluence Server. Rebuilding the index should not be necessary, if all the steps in the 'user account-level personal data' SQL workaround section are followed.
Location of Access Logs
Access logging is enabled by default from Confluence 7.11, and may have been manually enabled in earlier versions by following the instructions at Audit Confluence using the Tomcat valve component and How to enable user access logging. If access logging is enabled, the username of the user accessing a page, as well as some URLs, will remain in the access log files. These logs are not accessible directly via Confluence, but may be accessed by an administrator.
A non-exhaustive list of possible personal data that could be in the log files are:
- IP address
- Username/Display Name
- Email address
The location of the access logs is <confluence install>/logs/conf_access_log.log
(from Confluence 7.11 or if created by using the instructions at Audit Confluence using the Tomcat valve component). Access log data can also be written to the application logs at <confluence home>/logs/atlassian-confluence.log.
Please note that logging parameters (such as the log file location, and contents) may be different due to configuration options, and/or third-party add-ons. Administrators are advised to check the contents of the logs, and remove them if required.
Location of data pipeline export files
From Confluence 7.12, system administrators can use the data pipeline feature to export current state data from the Confluence database for analysis in an external business intelligence tool. The data is exported in CSV format, and stored in the <confluence home>/data-pipeline/
directory. This directory can be found in the local home directory for non-clustered installations, or in the shared home directory when Confluence is running in a cluster.
A non-exhaustive list of personal data that could be in the export files include:
- Username / display name
- Email address
- Free text including space titles, page titles, raw page content, and raw comment content.
Administrators are advised to check the contents of these exports and remove data if required.
Personal Spaces
Personal spaces are spaces created by a user, where the space key is their username. If any content is created in Confluence that links to this personal space, the link will contain, as part of its URL, the username of the user to which the personal space belongs. Deleting the user from Confluence does not remove their personal space or change the URLs. However, moving the pages being linked to, will update those URLs.
If the administrator wants the content in the personal space to be preserved, but still require the deletion of the username from the URLs, they should first move the content to a new space, then remove the personal space.
Moving pages to a new Space
Create a new space, then follow the instructions at Move and Reorder Pages. This should update all URLs linking to those moved pages, which should eliminate the username in those URLs from appearing in the future.
Deleting a Personal Space
An administrator can also delete the personal space by following the instructions at Delete a Space. An administrator may need to grant themselves delete permission on the space, by following the instructions at Assign space permissions.
Alternatively, use the REST API for space removal at Confluence REST API documentation.
Limitations
The URLs that contain the username must be created using the "link to page" feature in Confluence. If those links are added as direct web links, then they will not be automatically updated.
Please read Links for details on links to specific types of content.
Free-form textual personal data in the database
Other potential sources of personal data which could be stored in Confluence's database include:
- Free-form text in Pages, Blogs, Comments, and other custom content that may be added by third party add-ons from the Atlassian Marketplace.
- Free-form text in mentions, where the user's name was been overtyped.
- Audit logs contain information about configuration changes made to Confluence. These audit logs store the username of the user who performed the change.
- Free-form text in customizations made to Confluence, such as custom site headers and footers, site title, and custom user macros and templates.
Free-form text personal data in content (pages, blogs, comments)
For free-form text in pages, blogs, comments, and other content, the search feature should be used to identify sources of personal data that a user requests to be deleted. Read Confluence Search Fields for a list of fields and syntax that can be used to locate any personal data.
When a page or comment is found to contain personal data which needs to be scrubbed, the administrator will need to edit the page and remove it. Confluence, however, stores historical versions of pages, which may also need to be deleted manually by clicking on the delete link in the page history. Please following the instructions at How to Remove all Previous Versions of a Page Manually in the Database Using SQL Commands, if bulk removal is required.
Free-form text personal data in mentions
If the mention name as it displays on the page is changed, for example to include just the first name or a preferred name, this is treated as free-form text. The free-form text will still display, even after the user account has been deleted.
Audit Logs
Audit logs can have a retention period defined (by default, 3 years). This can be changed, and Confluence will automatically remove older entries. Please read Auditing in Confluence for information about administering them.
Free-form text personal data in Confluence customizations and configuration
There are various free form text fields in which an administrator may be able to add personal data as part of configuring Confluence. The administrator is advised to look through any customizations made, and check that no deleted user's personal data remains.
Here is a non-exhaustive list:
Location | Please read |
---|---|
Site theme and other layout customizations | Changing the Look and Feel of Confluence |
Administrator contact page | Configuring the Administrator Contact Page |
Custom site header and footer via Custom HTML | Styling Confluence with CSS |
Site Title | Changing the Site Title |
User Macros | Writing User Macros |
Page Templates and Blueprints | Administering Site Templates |
Shortcut Links | Configuring Shortcut Links |
PDF export customizations | Customize Exports to PDF |
Email template customizations | Customizing Email Templates |
Interface text customizations | Modify Confluence Interface Text |
Synchrony data
If you have collaborative editing enabled, every keystroke in the editor is stored by Synchrony in the Confluence database. This means that any references to a person's full name, user name, or other personal information typed in the editor will remain in the Synchrony tables in the database, seperately to where the page or comment content is stored. This data remains in the relevant Synchrony tables, even after the pages or comments themselves have been deleted.
In Confluence 7.0 and later two scheduled jobs are available to remove Synchrony data:
- The Synchrony data eviction (soft) job evicts all Synchrony data for any pages / blog posts that have not been modified in the last 3 days, and do not have an active editor session. This job runs every 10 minutes by default. This job helps keep your database tables small.
- The Synchrony data eviction (hard) job evicts all Synchrony data for any pages / blog posts that are 15 days or older, regardless of whether they've been modified more recently. This job is disabled by default, but can be scheduled to run on a regular basis. This job ensures there is no Synchrony data older than 15 days in your database.
See How to remove Synchrony data for more information.
In Confluence 6.x versions, there is a workaround to remove this data.
Browser Cache
The browser may cache a large amount of data for performance reasons. Confluence doesn't completely control the browser's behaviour regarding such caching. Therefore, there are some cases where removal of personal data automatically from the browser's cache is not feasible for Confluence, and must be done in each individual browser client.
For Chrome, local storage may be cleared by navigating to chrome://settings/siteData
, and finding the Confluence website url, clicking on the 'Local Storage' section, and then click remove all. For Firefox, this can be done in Settings, by following these instructions. However, with browsers constantly evolving, these instructions may change. Please see the browser vendor's documentation for clearing local storage for the most up-to-date instructions.
Mentions
Browser local storage is used to cache a list of recently used mentions for the logged-in user, and is never persisted on the Confluence server database. However, being cached in the browser means that any users who have had previously mentioned a user which has since been deleted, would continue to see that user in their mentions list. This list can be cleared from the browser directly by following the browser's documentation on clearing local storage. This list also continuously updates as the logged-in user makes new mentions of other users, and the deleted users will eventually stop showing up as they are no longer mentioned.
The mentions list may also contain a URL of the avatar image for the deleted user. This URL is defunct after deleting the user or running the SQL workaround, but the browser may be caching the image located at this URL. Therefore, some users may still continue to see the avatar when the mentions feature is used. The avatar should expire from the browser's cache after some time, but the exact timing may differ based on the browser's configuration.
Avatars
Avatars are images uploaded to Confluence by users into their profile, and they may be displayed when Confluence is listing users (such as in a mentions list).
In Confluence 6.13 and later, when you delete a user account their avatar is deleted.
In Confluence 6.12 and earlier, when you use the 'user account-level personal data' SQL workaround the user's avatar is deleted.
In all versions of Confluence these images are also cached by the browser, and as such, are not completely under the control of Confluence. The browser may periodically refresh/purge their cache, and so the avatars ought to eventually disappear, but exact timing depends on the browser's configuration.
Workarounds
User account-level personal data
This workaround only applies to Confluence 6.12 or earlier.
To delete a user's account-level personal data, use one of the following methods, depending on whether your Confluence instance is using an internal, delegated or external user directory.
Before attempting any of the workarounds below, please ensure that a backup of your instance is created first. If possible, test the workaround on a staging environment before attempting in your production environment.
Step 1 - Disabling or Removing the User
Internal user directory
- Disable the user by following the instructions at Delete or Disable Users.
- After the user is disabled, follow one of the methods in Step 2 below – auto generate the SQL query via a script, or manually create the SQL query.
External user directory - Connector
- Delete the user from the External Directory, and perform a resync by following the instructions at Synchronizing data from external directories.
- After the user is deleted, follow one of the methods in Step 2 below – auto generate the SQL query via a script, or manually create the SQL query.
External user directory - Delegated
- Delete the user from the delegated External Directory by following the instructions at Connecting to an Internal Directory with LDAP Authentication
- Disable the user by following the instructions at Delete or Disable Users.
- After the user is disabled, follow one of the methods in Step 2 below – auto generate the SQL query via a script, or manually create the SQL query.
Step 2 - Running the SQL workaround
Some parts of the process, such as removing account-level personal data from the search index, can have a performance impact on your site. You may want to run these scripts at a time that would have the least impact on your users. We found it took approximately 30 minutes to run the scripts on a site with about 10 million pages.
Python script to generate SQL queries per user
- Download or clone this repository: https://bitbucket.org/atlassian/gdpr/overview. There are some installation prerequisites before running the script, which is documented in the README file inside the repository.
Run the script, passing the username (the login name of the user) as the first parameter. If it contain spaces, quote it.
python3 parser4confluence.py -u '<USERNAME>' -f metadata/confluence_db.json -d oracle|postgresql|mysql|mssql
The above script will generate multiple SQL files under the folder
confluence_db_queries/<database-name>/
01_insert_journalentry.sql 02_delete_OS_PROPERTYENTRY.sql 03_delete_BODYCONTENT.sql 04_delete_CONTENTPROPERTIES.sql 05_delete_IMAGEDETAILS.sql 06_delete_CONTENT.sql 07_delete_NOTIFICATIONS.sql 09_delete_CONTENT.sql 10_delete_LIKES.sql 11_delete_CONTENT.sql 12_delete_cwd_membership.sql 13_delete_cwd_user_attribute.sql 14_delete_cwd_user.sql 15_update_user_mapping.sql
- Execute the SQL queries on your database, in the same order as the filenames.
- If you don't have autocommit enabled, make sure to commit your changes to persist on the database.
- Flush all caches to force UI to update by following the instructions at Cache Statistics.
- Flush the content index queue by going to the Content Indexing administration, and selecting Queue Contents > Flush Queue.
Manually construct SQL queries per user
- Go to this directory https://bitbucket.org/atlassian/gdpr/src/HEAD/confluence_db_queries/?at=master and download the pre-populated SQL scripts for your respective database.
- Open the SQL scripts in your preferred text editor.
- Replace the username that's already set __username__ with the required username.
- Run the SQL queries on the database, in the same order as the filenames.
- If you don't have autocommit enabled, make sure to commit your changes to persist on the database.
- Flush all caches to force UI to update by following the instructions at Cache statistics.
- Flush the content index queue by going to the Content Indexing administration, and selecting Queue Contents > Flush Queue.
Step 3 - Patching the Collaborative Editor
For Confluence instances running Confluence 6.0.x to Confluence 6.9.x that have enabled Collaborative Editing, you will need to download the appropriate version of the patched Collaborative Editor Plugin for your version of Confluence. See the table below:
Confluence Version | Download Link |
---|---|
5.10.x or earlier | Not applicable - no collaborative editing |
6.0.x | confluence-collaborative-editor-plugin-1.3.24.jar |
6.1.x to 6.2.3 | confluence-collaborative-editor-plugin-1.4.18.jar |
6.2.4 to 6.3.x, 6.4.x, 6.5.x, 6.6.x | |
6.7.x, 6.8.x | |
6.9.x | |
6.10.x and later | Patching the collaborative editor is not required. |
- After downloading the add-on jar, install it by going to Cog menu > Add-ons.
- Choose Upload add-on, then upload the add-on jar.
This may take several minutes, during which time the Collaborative Editing feature may not be available. This patched version of the Collaborative Editor will replace the bundled version that came with your installation of Confluence.
Uninstalling the collaborative editor plugin
If you wish to uninstall the above patched version of the collaborative editor plugin and restore the bundled version, you must:
- Disable the collaborative editing feature (see Administering Collaborative Editing for a guide). Note that unpublished changes may be lost. Consider asking people to publish their pages first.
- Disable the plugin called 'Synchrony Interop Bootstrap Plugin', by going to the Manage Addons link in the administration console, and searching for the plugin, and click disable
- Uninstall the patched version of the collaborative editor plugin
- Restart Confluence (which will restore the bundled version of the collaborative editor plugin)
- Re-enable the 'Synchrony Interop Bootstrap Plugin'
- Re-enable the collaborative editing feature
If these steps are not performed when uninstalling the patched version of the collaborative editor plugin, then the collaborative editor plugin may not re-enable correctly when restarting.
Known Limitations and Issues
Mentions in Pages, Drafts and Comments
Any existing mentions of the deleted user will become "Unknown User (xxxxxxxxxxx)", (where xxxxxxxxxx is the user key stored in the database). In certain parts of Confluence (for example, Activity Stream macro), the deleted user will be displayed as "Anonymous" rather than "Unknown User".
Any mentions that exist in an unpublished draft (see Drafts) at the time the workaround is deployed will remain as they were before the workaround (showing the display name). Those mentions will turn into "Unknown User (xxxxxxxxxx)" when the draft is published. In Confluence version 6.6.x or earlier, the mention may be changed into a link to the current page, with the text being the old username of the deleted user, appended to the '~' (tilde) character, rather than 'Unknown User'.
There is currently no method to automatically force a publish of all drafts by the administrator, or see a list of all unpublished drafts. Each user can view their own unpublished drafts in their recently worked on list. They can also go to their user menu, and click on the draft item. An administrator may choose to turn off collaborative editing, which will remove the unpublished drafts. However, this method may cause data loss of the unpublished drafts, and is not advised unless all deleted user mentions must be removed at all costs (regardless of any content loss).
If a deleted user is mentioned on a page after performing the workarounds above, there may be an error when publishing the page. Refreshing the page, or removing the mention should fix the problem. However, if the Collaborative Editing feature is turned off, the user mention will need to be deleted from the page to fix the problem.
Similarly, if the deleted user is mentioned when adding or editing a comment, the save will fail with an error. The workaround is to remove the mention of the deleted user in the comment before saving. This issue affects both inline comments, as well as image attachment comments.
Clearing the browser local storage will remove the ability to mention the deleted user (see the Browser Cache section on this page for details), and thus preventing the publishing problem from occurring for both pages and comments.
Workbox Notifications
Workbox Notifications (see Workbox Notifications) will continue to have the display name of the user that created the notification. They are automatically cleared by Confluence regularly, and old entries will be removed after 28 days. These jobs are run once per day, starting from the time the server starts up.
Synchrony data
This workaround only applies to Confluence 6.x versions.
If you have collaborative editing enabled, every keystroke in the editor is stored by Synchrony in the Confluence database. This means that any references to a person's full name, user name, or other personal information typed in the editor will remain in the Synchrony tables in the database, separately to where the page or comment content is stored.
A workaround is to truncate the relevant Synchrony tables in the database. See How to reduce the size of Synchrony tables to find out how to truncate these tables.
Known issues
There are a few known issues that you should be aware of, where personally identifiable information remains after running the SQL workaround scripts or deleting the user in the UI.