Disaster Recovery Guide for Jira

On this page

Still need help?

The Atlassian Community is here for you.

Ask the community

The following steps are only applicable to Jira 8.22. and earlier versions. If you want to set up a disaster recovery strategy in Jira 9.0 or later, you only need the <shared_home> directory (with corresponding index snapshot) and the database (or clone) that was used by Jira.

Jira Data Center is the only Atlassian-supported high availability solution for Jira. However, if you don't choose Jira Data Center, our Experts may be able to help implementing a high availability solution for your environment, contact our Experts team.

This page shows you how Jira Data Center documentation 6.4 can be used in implementing and managing a disaster recovery strategy for Jira. This does not cover the broader business practices, such as setting the key objectives (RTO, RPO & RCO), standard operating procedures, etc.

A disaster recovery strategy is a key part of any business continuity plan. It covers the processes that should be followed in the event of a disaster, to ensure that the business can recover and keep operating. For Jira, this means ensuring Jira's availability in the event of your primary site becoming unavailable.

On this page:

What is the difference between high availability and disaster recovery?

The terms "high availability", "disaster recovery" and "failover" can often be confused. For the purposes of this document,

  • "High availability" — a strategy to provide a specific level of availability, in Jira's case, access to the application and an acceptable response time. Automated correction and failover (within the same location) are usually part of high availability planning. See High Availability Guide for Jira.
  • "Disaster recovery" — a strategy to resume operations in an alternate data center (usually in a geographic location), if the main data center become unavailable (i.e. a disaster). Failover (to another location) is a fundamental part of disaster recovery. 
  • "Failover" — is when one machine takes over from another machine, when the aforementioned machines fails. This could be within the same data center or from one data center to another. Failover is usually part of both high availability and disaster recovery planning.

Overview

Before you begin, Jira Data Center documentation 6.4 or higher is required to implement the strategy described in this guide.

The guide describes what is generally referred to as a "cold standby" strategy. That means that the standby Jira instance is not continuously running and that some administrative steps need to be taken to start the standby instance and ensure it is in a suitable state to service the business needs of the organization.

The major components that need to be considered in the disaster recovery plan are:

Jira installationThe standby site should have the exact same version of Jira installed as the production site.
DatabaseThis is the primary source of truth for Jira and contains most of the Jira data (except for attachments, avatars, installed plugins, etc). The database needs to be replicated and continuously kept up to date to satisfy your RPO1.
Attachments

All issue attachments are stored in the Jira Data Center shared home and need to be replicated to the standby instance.

Search Index

The search index is not a primary source of truth and can always be recreated from the database, however for large installations this can be quite time consuming and the functionality of Jira would be greatly reduced until the index is fully recovered. Jira Data Center 6.4 provides tools for reducing this recovery time to the bare minimum.

If index recovery is enabled, all index snapshots are stored in the Jira Data Center shared home and need to be replicated to the standby instance.

PluginsUser installed plugins are stored in the Jira Data Center shared home and need to be replicated to the standby instance.
Other dataA few other non-critical items stored in the Jira Data Center shared home should also be replicated to the standby instance, such as User and Project avatars.

Setting up a standby system

Step 1. Install Jira Data Center 6.4 or higher

Install the same version of Jira on the standby system. Configure the system to attach to the standby database.

You also need to configure the instance to be a disaster recovery installation. This enables the automatic index recovery mechanism to kick in when Jira starts.

Add the following to jira-config.properties in the Jira Home directory of the standby instance:

disaster.recovery=true

DO NOT start the standby Jira system

Starting Jira would write data to the database, which you do not want to do.

You may like to test the installation by temporarily connecting it to a different database and starting Jira, then making sure it works as expected. Don't forget to update the database configuration to point to the standby database after your testing.

Step 2. Implement a data replication strategy

Replicating data to your standby location is crucial to a cold standby failover strategy. You don't want to fail over to your standby Jira instance and find that it is out of date or that it takes many hours to reindex.

Database

All of the following Jira supported database suppliers provide their own database replication solutions:

You need to implement a database replication strategy that meets your RTO, RPO and RCO1

Files

Jira can automatically manage the replication of files to a secondary location. These include attachments, avatars, index snapshots and installed plugins. 

To enable Jira's file replication, navigate to the Replication Settings page in your Jira administration console, and enable file replication.
 

You will need to perform a synchronization, by pressing the Synchronize button, when you first enable file replication. We recommend that you do this outside of peak hours — while it will not prevent access to Jira, it is potentially a long running operation.

After the initial synchronization, Jira will automatically keep your secondary copy up to date. This secondary copy is written asynchronously, so the performance of your primary Jira instance won't be affected.

Notes:

  • Changing the file replication settings — If you change any of the file replication settings, you will need to perform another synchronization (via the Synchronize button). We recommend that you do this outside of peak hours.
  • Setting the replication folder — Set the jira.secondary.home property to the desired path in the jira-config.properties file.
    • If you are running Jira in clustered mode, the secondary home must be a path accessible to all nodes.
  • Other file types — Files added through other means, such as files added by plugins, will require another means of replication. In these cases, contact the plugin vendor for recommendations.

Clustering considerations

If you have a clustered environment, you need to be aware of the following, in addition to the information above:

Standby cluster

If you have a standby cluster, the node ids of the standby nodes must be different from those of the live cluster.

There is no need for the configuration of the standby cluster to reflect that of the live cluster, it may contain more or fewer nodes, depending upon your requirements and budget. Fewer nodes may result in lower throughput but that may be acceptable depending upon your circumstances.

File locations

Where we mention <JIRA_SHARED_HOME> for the location of files that need to be synchronized will be the shared home for the cluster.

<JIRA_LOCAL_HOME> refers to the node specific home directory.

Starting the standby clusterIt is important to initially start only one node of the cluster, allow it to recover the search index and check it is working correctly before starting additional nodes.

Disaster recovery testing

You should exercise extreme care when testing any disaster recovery plan. Simple mistakes may cause your live instance to be corrupted, for example, if testing updates are inserted into your production database. You may detrimentally impact your ability to recover from a real disaster, while testing your disaster recovery plan.

(info) The key is to keep the main data center as isolated as possible from the disaster recovery testing.

Prerequisites

Before you perform any testing, you need to isolate your production data:

Database
  1. Temporarily pause all replication to the standby database.
  2. Replicate the data from the standby database to another database that is isolated and with no communication with the main database.
Attachments, plugins and indexes

You need to ensure that no plugin updates or index backups occur during the test:

  1. Disable index backups.
  2. Instruct sysadmins to not perform any updates in Jira.

Note, attachments should not cause any kind of problem, healthchecks in the failover instance are going to give enough information if the folders have the write permissions.

Installation folders
  1. Clone your standby installation, separate from both the live and standby instances.
  2. Change the connection to the database in the <JIRA_LOCAL_HOME>/dbconfig.xml to avoid any conflict.

After this, you can resume all replication to the standby instance, including the database.

Performing the disaster recovery testing

Once you have isolated your production data, follow the steps below to test your disaster recovery plan:

  1. Ensure that the new database is ready, with the latest snapshot and no replication.
  2. Ensure that you have a copy of Jira on a clean server with the proper dbconfig.xml connection.
  3. Ensure that you have JIRA_SHARED_HOME mapped as it was in the standby instance, but in the test server. It is important to have the latest index snapshot in <JIRA_SHARED_HOME>/export/indexsnapshots folder.
  4. Disable email.
  5. Start Jira in Disaster Recovery mode, by starting it with the following parameter: disaster.recovery=true.

Handling a failover

In the event of your primary site becoming unavailable, you will need to fail over to your standby system. This section describes how to do this, including instructions on how to check the data in your standby system.

Step 1. Fail over to the standby instance

The basic steps to failover to the standby instance are:

  1. Ensure your live system is shutdown and no longer updating the database.
  2. Ensure that the directory <JIRA_SHARED_HOME>/indexarchive does not exist on the standby instance.
  3. Copy the contents of the <JIRA_SHARED_HOME>/export/indexsnapshots to <JIRA_SHARED_HOME>/import/indexsnapshots.
  4. Perform whatever steps are required to activate your standby database.
  5. Start Jira in the standby instance.
  6. Wait for Jira to start and check it is operating as expected.
  7. Update your DNS, HTTP Proxy or other front end devices to route traffic to your standby server.

You should check the log, <JIRA_LOCAL_HOME>/log/atlassian-jira.log after Jira starts for information regarding the recovery state.

Step 2. Check the data in your standby instance

After you have failed over to your standby instance, perform these checks before users start accessing the system and changing data. You will need to be a Jira administrator with the 'Browse Project's permission for all projects.

Navigate to Administration > System > Atlassian Support Tools > Health Checks, and check the following: 

Database and index consistency

From the Indexing section of the health checks.

  • A successful check will show:
  • An unsuccessful check will show:

Verify that the item count and updated date lie within your organization's RPO.

Attachments

From the Attachments section of the health checks.

  • A successful check will show:
     
  • An unsuccessful check will show:

If the check does not work, you can manually determine the recovery point, as follows:

  1. In your database, run the following SQL query:

    select issueid, created from fileattachment order by created desc limit 1;
  2. In Jira, navigate to Issues > Search for issues, then run the following advanced (JQL) search:

    id=<issue_id>

    where <issue_id> is the issueid returned by the SQL query in the previous step.  

  3. Open the issue returned by the search and check that the attachments on the issue are visible. If the attachments are not visible, check some slightly older ones; you should be able to determine the most recent attachment that is available as well as which attachments are missing.

Returning to the primary instance

In most cases, you will want to return to using your primary instance, after you have resolved the problems that caused the disaster. This is easiest to achieve if you can schedule a reasonably-sized outage window.

You need to:

  • Synchronize your primary database with the state of the secondary.
  • Synchronize the primary attachment directory with the state of the secondary.
  • Recover the index state on the primary server.

Preparation

Attachments and other files
  1. Use rsync or a similar uililty to synchronize the majority of attachments to the primary server before starting the switchover process.
  2. Similarly, you should synchronize the installed plugins and logos before you start.
Search indexEnable Index snapshots on the standby (running) instance so that you have a recent index snapshot. This should be copied to a location that is accessible from the primary instance.

Perform the cut over

  1. Shut down Jira on the standby instance.
  2. Ensure the database is synchronized correctly and configured to as required.
  3. Start Jira.
  4. Log in to Jira and restore the index from the index snapshot. You will need to know the name and location of the snapshot file.
  5. Check that Jira is operating as expected.
  6. Update your DNS, HTTP Proxy or other front end devices to route traffic to your primary server.

Other resources

Atlassian Experts

Jira Data Center is the only Atlassian-supported high availability solution for Jira. However, if you don't choose Jira Data Center, our Experts may be able to help implementing a Disaster Recovery Plan for your environment, contact our Experts team.

Atlassian Answers

Our community and staff are active on Atlassian Answers. Feel free to contribute your best practices, questions and comments. Here are some of the answers relevant to this page:

Troubleshooting

If you encounter problems after failing over to your standby instance, the following FAQs may help:

What do I do if my database is not synchronized correctly?

If the database does not have the data available that it should, then you will need to restore the database from a backup.

Once you have restored the database, the search index will no longer by in sync with the database. You can either do a full re-index, background or foreground, or recover from the latest index snapshot if you have one. The index snapshot can be older or more recent than your database backup, it will synchronize itself as part of the recovery process.

What do I do if my search index is corrupt?

If the search index is corrupt, you can either do a full re-index, background or foreground, or recover from an earlier index snapshot if you have one. 

What do I do if attachments are missing?

You may be able to recover them from backups if you have them, or recover from the primary site, if you have access to the hard drives.  Tools such as rsync may be useful in such circumstances. Missing attachments will not stop Jira performing normally: the missing attachments will just not be available, so users may be able to upload them again.

What happens to my application links during failover?

Application links are stored in the database and if the database replica is up to date, then the application links will be preserved.

However, you also need to consider how each end of the link knows the address of the other:

  • If you use host names to address the partners in the link and the backup Jira server has the same hostname, via updates to the DNS or similar, then the links should remain intact and working. 
  • If the application links were built using IP addresses and these are not the same, then the application links will need to be re-established. 
  • Often people will use IP addresses that are valid on the internal company network and the backup system may be remote and outside the original firewall, in these cases the application links will need to be re-established.


Definitions

1 - Definitions

RPORecovery Point ObjectiveHow up-to-date you require your Jira instance to be after a failure.
RTORecovery Time ObjectiveHow quickly you require your standby system to be available after a failure.
RCORecovery Cost ObjectiveHow much you are willing to spend on your disaster recovery solution.
Last modified on Jul 4, 2022

Was this helpful?

Yes
No
Provide feedback about this article
Powered by Confluence and Scroll Viewport.