Data pipeline export schema

On this page

Still need help?

The Atlassian Community is here for you.

Ask the community

This page describes the structure and data schema of the Confluence data export files.

To learn more about how the set up and configure your data pipeline, see Data pipeline.

Output file format and structure

The output files are written in CSV format and are RFC4180 compliant. They have the following characteristics:

  • Each file has a header. This includes files from exports that resulted in no data.
  • New lines are separated by CRLF characters \r\n.
  • Fields containing line breaks (CRLF), double quotes, and commas are enclosed in double quote.
  • If double-quotes are present inside fields, then a double-quote appearing inside a field are escaped by preceding it with another double quote. For example: "aaa", "b""bb", "ccc".
  • Fields with no data (null values) are represented in the CSV export by two consecutive delimiters (as in, ,,).
  • Embedded break lines are escaped by default and printed as n. 

Fields are available in all schema versions, unless specifically noted below.

On this page:

Users file

FieldDescription
instance_url

Type: URL

Description: Base URL of the current instance.

Example: https://yoursitename.com

user_id

Type: String

Description: ID of the user

Example: ff8080817572401e01757240b3520000

user_name

Type: String

Description: User name of the user.

Example: jsmith

user_fullname

Type: String

Description: Full name of the user.

Example: John Smith

user_email

Type: Email

Description: Email address of the user

Example: jsmith@example.com

Spaces file

FieldDescription

space_key

Type: String

Description: Unique identifier that forms part of the URL for that space

Example: AMF

instance_url


Type: String

Description: Base URL of the current instance

Example: https://example.com

space_url

Type: URL

Description: The space URL

Example: https://example.com/display/SPACEKEY

homepage_url

Type: URL

Description: The space's home page URL

Example: https://example/display/SPACEKEY/Page+name

space_name

Type: String

Description: Title of the space

Example: Design Team Space

space_type

Type: String

Description: Whether the space is a global or personal space

Example: global

space_status

Type: String

Description: Whether the status of the space is current or archived

Example: CURRENT

creator_id

Type: User

Description: ID of the user who created the space

Example: ff8080817572401e01757240b3520000

last_modifier_id

Type: User

Description: ID of the user who last modified the space

Example: ff8080817572401e01757240b3520000

created_date

Type: Time

Description: Space creation timestamp

Example: 2021-02-26T04:14:38Z

updated_date

Type: Time

Description: Last modification timestamp

Example: 2021-02-26T04:14:38Z

Pages file

FieldDescription

page_id

Type: Number

Description: Unique ID of the page

instance_url

Type: String

Description: Base URL of the current instance

Example: https://example.com

space_key

Type: String

Description: Space key of the space the page exists in

page_url


Type: String

Description: URL of the page

Example: https:/example/display/SPACEKEY/Page+name

page_type


Type: String

Description: Whether the entity is a page or a blog post

Example: page

page_title


Type: String

Description: Title of the page

page_status

Type: String

Description: Status of the page (the only value is current, this does not indicate that a page is in a space that has been archived)

page_content


Type: String

Description: Content of the page in Confluence storage format (limited to 10,000 characters)

Example:

<ac:layout><ac:layout-section ac:type="two_equal"><ac:layout-cell>
<p>This is sample content in a layout</p></ac:layout-cell><ac:layout-cell>
<p>With two columns</p></ac:layout-cell></ac:layout-section></ac:layout>
page_parent_id

Type: Number

Description: ID of the current page's direct parent

labels

Type: String

Description: Comma separated list of labels of the page

Example: ["personal", "expense”]

page_version

Type: String

Description: Version number of the latest version page

Example: 3

creator_id

Type: Number

Description: ID of the user who created the page

Example: ff8080817572401e01757240b3520000

last_modifier_id

Type: User

Description: ID of the user who last updated the page

Example: ff8080817572401e01757240b3520000

created_date


Type: Time

Description: Creation timestamp

Example: 2021-02-26T04:14:38Z

updated_date

Type: Time

Description: Last modification timestamp

Example: 2021-02-26T04:14:38Z

last_update_description


Type: String

Description: Version comment entered when the page was last updated (limited to 2,000 characters)

Comments file

FieldDescription

comment_id


Type: Number

Description: Unique ID of the comment

instance_url

Type: String

Description: Base URL of the current instance

Example: https://example.com

comment_url


Type: String

Description: Full URL of the comment

page_id

Type: Number

Description: Unique ID of the page which contains the comment

parent_comment_id


Type: Number

Description: If the comment is a reply, this is the ID of the parent comment (empty for top level comments)

comment_content

Type: String

Description: Content of the comment in Confluence storage format (limited to 2,000 characters)

Example:

<p>Sample comment on a page</p>

creator_id

Type: User

Description: ID of the user who created the comment

Example: ff8080817572401e01757240b3520000

last_modifier_id

Type: User

Description: ID of the user who last modified the comment

Example: ff8080817572401e01757240b3520000

created_date


Type: Time

Description: Creation timestamp

Example: 2021-02-26T04:14:38Z

updated_date


Type: Time

Description: Last modification timestamp

Example: 2021-02-26T04:14:38Z

Analytics events file

FieldDescription

instance_url


Type: String

Description: Base URL of the current instance

Example: https://example.com

event_id


Type: Number

Description: Unique ID of the analytics event

event_name

Type: String

Description: Name of the analytics event. Events include page_viewed, page_created, page_updated, blog_viewed, blog_created, blog_updated, comment_created, attachment_viewed, attachment_created.

Example: page_created

created_date

Type: Time

Description: Creation timestamp

Example: 2021-02-26T04:14:38Z

event_author_id


Type: User

Description: ID of the user who performed the action that triggered the event

Example: ff8080817572401e01757240b3520000

event_space_key

Type: String

Description: Space key of the space the event was triggered in or affects (affected object)

Example: SPACEKEY

event_container_id

Type: Number

Description: ID of the containing entity. For pages this is the page ID, for attachments and comments, it’s the page ID of the page the attachment or comment appears on.

event_content_id


Type: Number

Description: ID of the entity. For pages this is the page ID, for attachments, it is the attachment ID, and for comments it's the comment ID.

Last modified on Oct 6, 2021

Was this helpful?

Yes
No
Provide feedback about this article
Powered by Confluence and Scroll Viewport.