Data pipeline
Considerations
The ability to export data via API is the first milestone towards fully automating data pipelines for Jira. This milestone focuses on providing data extraction capabilities to teams that already have an existing data management platform (like Tableau or PowerBI) and workflow.
With this milestone:
- A data export will affect Jira's performance, so we recommend that you run it during hours of light use.
- Data from Jira Core, Jira Software, and Jira Service Management will be exported. This includes all core issues and their fields, as well as built-in fields within Jira Software (like Sprints and Story Points) and Jira Service Management (like Customer Request Type and SLAs).
- Custom fields created by you are not exported.
Security
The export will include all data, giving you as much information to filter and transform as you see fit. This abundance in data can give you great flexibility in generating meaningful insights from how teams use Jira.
At the same time, this export won't filter out data based on security and confidentiality. This means you'll need to process secure, sensitive data yourself after each export. In line with this, the schema for issues contains a security_level
field that you can use to do this. Please ensure the Jira shared home folder has sufficient access restrictions.
Performance impact
Data export is a resource-intensive process impacting application nodes, the database, and the Lucene index. In our internal testing, we observed a 5% performance degradation over all product functions on a node actively performing an export. We strongly recommend that you:
- Perform the data export during hours of low activity, or on a node with no activity
- Limit the amount of data exported through the
fromDate
parameter; a date further in the past will export more data, resulting in a longer data export
Our test results also showed the following approximate durations for the export per number of issues:
Number of issues | Approximate export duration | |
---|---|---|
Jira Software installed | Jira Software+Jira Service Management installed | |
1 Million | 15 minutes | 30 minutes to 2 hours |
7 Million | 2 hours | 3-6 hours |
30 Million | 9 hours | 12-24 hours |
Test performance VS production
The performance data presented here is based on our own internal regression testing. The actual duration and impact of data export on your own environment will likely differ depending on your infrastructure, applications installed (as in, Jira Software and Jira Service Management), configuration, and load.
We used Jira Performance Tests to test a data export's performance on a Jira Data Center environment on AWS. This environment had one c5.9xlarge Jira node and one PostgreSQL database. To test user load, we used 24 virtual users across 2 virtual user nodes.
Requirements
This feature is only supported on a Jira Data Center license.
To export Jira's current state data via API, you'll also need to log in as a Jira systems admin via the API. For more information about supported API authentication methods, see Security overview.
Performing the data export
To export Jira's current state data, use the /export
REST API endpoint from the /jira/rest/datapipeline/latest/
base URL:
https://<host>:<port>/jira/rest/datapipeline/latest/export?fromDate=<yyyy-MM-ddTHH:mmTZD>
The fromDate
parameter limits the amount of data exported. That is, only data on issues created or updated after the fromDate
value will be exported. If you trigger the export without the fromDate
parameter, all data from the last 365 days will be exported.
The /export
REST API endpoint has three methods:
Non-default endpoint
The base URL is /jira/rest/datapipeline/latest/
by default, and this will change if you set a different context path for Jira. Learn more about context paths
Automatic data export cancellations
If a node running a data export is gracefully shut down, Jira will automatically mark the export as CANCELLED
.
If the JVM is not notified after a crash or hardware-level failure occurs, the export process may get locked. This means you'll need to manually mark the export as CANCELLED through the DELETE
request. Doing so releases the process from any lock, allowing you to perform another data export.
Configuring the data export
You can configure the format of the export data through the following system properties:
Default value | Description |
plugin.data.pipeline.embedded.line.break.preserve | |
False | Specifies whether embedded line breaks should be preserved in the output files. Line breaks can be problematic with tools such as Hadoop. This property is set to |
plugin.data.pipeline.embedded.line.break.escape.char | |
\\n | Escaping character for embedded line breaks. By default, Jira will print \n for every embedded line break. |
Output files
Each time you perform a data export, Jira will assign a numerical job ID to the task (starting with 1
for your first ever data export). This job ID will be used in the file name and location of the files containing your exported data.
The exported data will be saved in three CSV files, all located in /<jira-shared-home>/data-pipeline/export/<job_id>/:
issues_job<job_id>_<timestamp>.csv
(for issues)
(for Jira Software and Jira Service Management fields)issue_fields_job<job_id>_<timestamp>.csv
sla_cycles_job<job_id>_<timestamp>.csv
(for SLA cycle information, if Jira Service Management is installed)
To load and transform the data in this export, you'll need to understand its schema. For a detailed reference of the schema, see Data Pipeline export schema.
Sample Spark and Hadoop import configurations
If you have an existing Spark or Hadoop instance, use the following references to configure how to import your data for further transformation: