To export data using the data pipeline, you’ll need:
- A valid Jira Data Center license
- Jira system administrator permissions.
See Security overview for more information about supported API authentication methods.
There are a number of security and performance impacts you’ll need to consider before getting started.
If you need to filter out data based on security and confidentiality, this must be done after the data is exported.
Exported files are saved in your shared home directory, so you’ll also want to check this is secured appropriately.
When scheduling your exports, we recommend that you:
- Limit the amount of data exported using the
fromDateparameter, as a date further in the past will export more data, resulting in a longer data export.
- Schedule exports during hours of low activity, or on a node with no activity, if you do observe any performance degradation during the export.
|Approximate export duration|
|1 million issues||7 million issues|
|15 minutes||2 hours|
|1 hour||9 hours|
|5 hours||22 hours|
|Jira Software and Jira Service Management|
|30 minutes to 2 hours||3 to 6 hours|
Test performance VS production
The data presented here is based on our own internal regression testing. The actual duration and impact of a data export on your own environment will likely differ depending on:
- your infrastructure, configuration, and load
- applications installed (Jira Software and Jira Service Management)
- amount of custom field and issue history data to be exported.
We used Jira Performance Tests to test a data export's performance on a Jira Data Center environment on AWS. This environment had one c5.9xlarge Jira node and one PostgreSQL database. To test user load, we used 24 virtual users across 2 virtual user nodes.
Access the data pipeline
To access the data pipeline:
- In the upper-right corner of the screen, select Administration > System.
- Select Data pipeline.
Schedule regular exports
To set the export schedule:
- From the Data pipeline screen, select Schedule settings.
- Select the Schedule regular exports checkbox.
- Select the date to include data from. Data from before this date won’t be included. This is usually set to 12 months or less.
- Choose how often to repeat the export.
- Select a time to start the export. You may want to schedule the export to happen outside working hours.
- Select the Schema version to use (if more than one schema is available).
- Save your schedule.
Timezones and recurring exports
We use your server timezone to schedule exports (or system timezone if you’ve overridden the server time in the application). The export schedule isn’t updated if you change your timezone. If you do need to change the timezone, you’ll need to edit the schedule and re-enter the export time.
You can schedule exports to happen as often as you need. If you choose to export on multiple days, the first export will occur on the nearest day after you save the schedule. Using the example in the screenshot above, if you set up your schedule on Thursday, the first export would occur on Saturday, and the second export on Monday. We don’t wait for the start of the week.
The export schema defines the structure of the export. We version the schema so that you know your export will have the same structure as previous exports. This helps you avoid problems if you’ve built dashboards or reports based on this data.
We only introduce new schema versions for breaking changes, such as removing a field, or if the way the data is structured changes. New fields are simply added to the latest schema version.
Older schema versions will be marked as ‘deprecated’, and may be removed in future versions. You can still export using these versions, just be aware we won’t update them with any new fields.
Check the status of an export
The Export details table will show the most recent exports, and the current status.
Select> View details to see the full details of the export in JSON format. Details include the export parameters, status, and any errors returned if the export failed.
For help resolving failed or cancelled exports, see Data pipeline troubleshooting.
Cancel an export
- Go to the Data pipeline screen.
- Select next to the export, and choose Cancel export.
- Confirm you want to cancel the export.
It can take a few minutes for the processes to be terminated. Any files already written will remain in the export directory. You can delete these files if you don’t need them.
Automatic data export cancellations
DELETErequest). This releases the process lock, allowing you to perform another data export.
Configuring the data export
You can configure the format of the export data through the following system properties.
Specifies whether embedded line breaks should be preserved in the output files. Line breaks can be problematic for some tools such as Hadoop.
This property is set to
Escaping character for embedded line breaks. By default, we'll print
To prevent you from running out of disk space, the data pipeline will check before and during an export that there is at least 5GB free disk space.
Set this property, in gigabytes, to increase or decrease the limit. To disable this check, set this property to
You can further configure your export to exclude certain types of data using feature flags. See How to manage dark features in Jira to learn how to use feature flags.
Specifies whether custom field data should be included in the export. Exporting custom field data may increase your export duration, depending on the amount of custom field data you have.
Specifies whether historical issue data should be included in the export. Exporting historical data will significantly increase your export duration.
Specifies whether archived issues should be included in the export.
Add the flag with the suffix
Use the data pipeline REST API
To start a data pipeline export, make a POST request to
Here is an example request, using cURL and a personal access token for authentication:
curl -H "Authorization:Bearer ABCD1234" -H "X-Atlassian-Token: no-check" -X POST https://myexamplesite.com/rest/datapipeline/latest/ export?fromDate=2020-10-22T01:30:11Z
You can also use the API to check the status, change the export location, and schedule or cancel an export.
For full details, refer to the Data pipeline REST API reference.
Each time you perform a data export, we assign a numerical job ID to the task (starting with 1 for your first ever data export). This job ID is used in the file name, and location of the files containing your exported data.
Location of exported files
Exported data is saved as separate CSV files. The files are saved to the following directory:
<shared-home>/data-pipeline/export/<job-id>if you run Jira in a cluster
<local-home>/data-pipeline/export/<job-id>you are using non-clustered Jira
<job-id> directory you will see the following files:
sla_cycles_<job_id>_<schema_version>_<timestamp>.csv(Jira Service Management only)
To load and transform the data in this export, you'll need to understand its schema. See Data pipeline export schema for a summary of the contents of each file.
Set a custom export path
To change the root export path, make a
PUT request to
In the body of the request pass the absolute path to your preferred directory.
For full details, including how to revert back to the default path, refer to the Data pipeline REST API reference.
Analyse data pipeline data
Once you've scheduled your exports, and have the CSV files, you can import these files into a database or data lake for analysis.
Sample DevOps dashboards
To get you started, we've created a DevOps dashboard template for Tableau and Microsoft PowerBI that uses Jira data to give you an insight into the engineering health of your team.
Make the most of the data pipeline with the DevOps dashboard
Sample Spark and Hadoop import configurations
If you have an existing Spark or Hadoop instance, use the following references to configure how to import your data for further transformation:
%python # File location file_location = "/FileStore/**/export_2020_09_24T03_32_18Z.csv" # Automatically set data type for columns infer_schema = "true" # Skip first row as it's a header first_row_is_header = "true" # Ignore multiline within double quotes multiline_support = "true" # The applied options are for CSV files. For other file types, these will be ignored. Note escape & quote options for RFC-4801 compliant files df = spark.read.format("csv") \ .option("inferSchema", infer_schema) \ .option("header", first_row_is_header) \ .option("multiLine", multiline_support) \ .option("quote", "\"") \ .option("escape", "\"") \ .option("encoding", "UTF-8").load(file_location) display(df)
CREATE EXTERNAL TABLE IF NOT EXISTS some_db.datapipeline_export ( `id` string, `instance_url` string, `key` string, `url` string, `project_key` string, `project_name` string, `project_type` string, `project_category` string, `issue_type` string, `summary` string, `description` string, `environment` string, `creator_id` string, `creator_name` string, `reporter_id` string, `reporter_name` string, `assignee_id` string, `assignee_name` string, `status` string, `status_category` string, `priority_sequence` string, `priority_name` string, `resolution` string, `watcher_count` string, `vote_count` string, `created_date` string, `resolution_date` string, `updated_date` string, `due_date` string, `estimate` string, `original_estimate` string, `time_spent` string, `parent_id` string, `security_level` string, `labels` string, `components` string, `affected_versions` string, `fix_versions` string ) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde' WITH SERDEPROPERTIES ( "escapeChar" = "\\", 'quoteChar' = '"', 'separatorChar' = ',' ) LOCATION 's3://my-data-pipeline-bucket/test-exports/' TBLPROPERTIES ('has_encrypted_data'='false');
Troubleshooting issues with data exports
Was this helpful?Yes Provide feedback about this article