Data pipeline
Requirements
To trigger data exports through the REST API, you’ll need:
- A valid Confluence Data Center license
- Systems Administrator global permissions
Considerations
There are a number of security and performance impacts you’ll need to consider before getting started.
Security
Export performance
Access the data pipeline
To access the data pipeline go to
> Data pipeline.Schedule regular exports
Check the status of an export
Cancel an export
Exclude projects from the export
Automatic data export cancellations
Configuring the data export
You can configure the format of the export data using the following system properties.
Use the data pipeline REST API
Output files
Location of exported files
Exported data is saved as separate CSV files. The files are saved to the following directory:
<shared-home>/data-pipeline/export/<job-id>
if you run Confluence in a cluster<local-home>/data-pipeline/export/<job-id>
you are using non-clustered Confluence
Within the <job-id>
directory you will see the following files:
users_job<job_id>_<schema_version>_<timestamp>.csv
spaces_job<job_id>_<schema_version>_<timestamp>.csv
pages_job<job_id>_<schema_version>_<timestamp>.csv
comments_job<job_id>_<schema_version>_<timestamp>.csv
analytics_events_job<job_id>_<schema_version>_<timestamp>.csv
To load and transform the data in these files, you'll need to understand the schema. See Data pipeline export schema.
Set a custom export path
Sample Spark and Hadoop import configurations
If you have an existing Spark or Hadoop instance, use the following references to configure how to import your data for further transformation.
Troubleshooting issues with data exports