Disaster Recovery Guide for Jira
What is the difference between high availability and disaster recovery?
The terms "high availability", "disaster recovery" and "failover" can often be confused. For the purposes of this document,
- "High availability" — a strategy to provide a specific level of availability, in Jira's case, access to the application and an acceptable response time. Automated correction and failover (within the same location) are usually part of high availability planning. See High Availability Guide for Jira.
- "Disaster recovery" — a strategy to resume operations in an alternate data center (usually in a geographic location), if the main data center become unavailable (i.e. a disaster). Failover (to another location) is a fundamental part of disaster recovery.
- "Failover" — is when one machine takes over from another machine, when the aforementioned machines fails. This could be within the same data center or from one data center to another. Failover is usually part of both high availability and disaster recovery planning.
Overview
Before you begin, Jira Data Center documentation 6.4 or higher is required to implement the strategy described in this guide.
The guide describes what is generally referred to as a "cold standby" strategy. That means that the standby Jira instance is not continuously running and that some administrative steps need to be taken to start the standby instance and ensure it is in a suitable state to service the business needs of the organization.
The major components that need to be considered in the disaster recovery plan are:
Jira installation | The standby site should have the exact same version of Jira installed as the production site. |
---|---|
Database | This is the primary source of truth for Jira and contains most of the Jira data (except for attachments, avatars, installed plugins, etc). The database needs to be replicated and continuously kept up to date to satisfy your RPO1. |
Attachments | All issue attachments are stored in the Jira Data Center shared home and need to be replicated to the standby instance. |
Search Index | The search index is not a primary source of truth and can always be recreated from the database, however for large installations this can be quite time consuming and the functionality of Jira would be greatly reduced until the index is fully recovered. Jira Data Center 6.4 provides tools for reducing this recovery time to the bare minimum. If index recovery is enabled, all index snapshots are stored in the Jira Data Center shared home and need to be replicated to the standby instance. |
Plugins | User installed plugins are stored in the Jira Data Center shared home and need to be replicated to the standby instance. |
Other data | A few other non-critical items stored in the Jira Data Center shared home should also be replicated to the standby instance, such as User and Project avatars. |
Setting up a standby system
Step 1. Install Jira Data Center 6.4 or higher
Install the same version of Jira on the standby system. Configure the system to attach to the standby database.
You also need to configure the instance to be a disaster recovery installation. This enables the automatic index recovery mechanism to kick in when Jira starts.
Add the following to jira-config.properties
in the Jira Home directory of the standby instance:
disaster.recovery=true
DO NOT start the standby Jira system
Starting Jira would write data to the database, which you do not want to do.
You may like to test the installation by temporarily connecting it to a different database and starting Jira, then making sure it works as expected. Don't forget to update the database configuration to point to the standby database after your testing.
Step 2. Implement a data replication strategy
Replicating data to your standby location is crucial to a cold standby failover strategy. You don't want to fail over to your standby Jira instance and find that it is out of date or that it takes many hours to reindex.
Database | All of the following Jira supported database suppliers provide their own database replication solutions: You need to implement a database replication strategy that meets your RTO, RPO and RCO1. |
---|---|
Files | Jira can automatically manage the replication of files to a secondary location. These include attachments, avatars, index snapshots and installed plugins. To enable Jira's file replication, navigate to the Replication Settings page in your Jira administration console, and enable file replication. You will need to perform a synchronization, by pressing the Synchronize button, when you first enable file replication. We recommend that you do this outside of peak hours — while it will not prevent access to Jira, it is potentially a long running operation. After the initial synchronization, Jira will automatically keep your secondary copy up to date. This secondary copy is written asynchronously, so the performance of your primary Jira instance won't be affected. Notes:
|
Clustering considerations
If you have a clustered environment, you need to be aware of the following, in addition to the information above:
Standby cluster | If you have a standby cluster, the node ids of the standby nodes must be different from those of the live cluster. There is no need for the configuration of the standby cluster to reflect that of the live cluster, it may contain more or fewer nodes, depending upon your requirements and budget. Fewer nodes may result in lower throughput but that may be acceptable depending upon your circumstances. |
---|---|
File locations | Where we mention
|
Starting the standby cluster | It is important to initially start only one node of the cluster, allow it to recover the search index and check it is working correctly before starting additional nodes. |
Disaster recovery testing
You should exercise extreme care when testing any disaster recovery plan. Simple mistakes may cause your live instance to be corrupted, for example, if testing updates are inserted into your production database. You may detrimentally impact your ability to recover from a real disaster, while testing your disaster recovery plan.
The key is to keep the main data center as isolated as possible from the disaster recovery testing.
Prerequisites
Before you perform any testing, you need to isolate your production data:
Database |
|
---|---|
Attachments, plugins and indexes | You need to ensure that no plugin updates or index backups occur during the test:
Note, attachments should not cause any kind of problem, healthchecks in the failover instance are going to give enough information if the folders have the write permissions. |
Installation folders |
|
After this, you can resume all replication to the standby instance, including the database.
Performing the disaster recovery testing
Once you have isolated your production data, follow the steps below to test your disaster recovery plan:
- Ensure that the new database is ready, with the latest snapshot and no replication.
- Ensure that you have a copy of Jira on a clean server with the proper
dbconfig.xml
connection. - Ensure that you have
JIRA_SHARED_HOME
mapped as it was in the standby instance, but in the test server. It is important to have the latest index snapshot in<JIRA_SHARED_HOME
folder.>/
export/indexsnapshots
- Disable email.
- Start Jira in Disaster Recovery mode, by starting it with the following parameter: disaster.recovery=true.
Handling a failover
In the event of your primary site becoming unavailable, you will need to fail over to your standby system. This section describes how to do this, including instructions on how to check the data in your standby system.
Step 1. Fail over to the standby instance
The basic steps to failover to the standby instance are:
- Ensure your live system is shutdown and no longer updating the database.
- Ensure that the directory
<JIRA_SHARED_HOME
does not exist on the standby instance.>/indexarchive
- Copy the contents of the
<JIRA_SHARED_HOME>/export/indexsnapshots
to<JIRA_SHARED_HOME>/import/indexsnapshots
. - Perform whatever steps are required to activate your standby database.
- Start Jira in the standby instance.
- Wait for Jira to start and check it is operating as expected.
- Update your DNS, HTTP Proxy or other front end devices to route traffic to your standby server.
You should check the log, <JIRA_LOCAL_HOME
after Jira starts for information regarding the recovery state.>/log/atlassian-jira.log
Step 2. Check the data in your standby instance
After you have failed over to your standby instance, perform these checks before users start accessing the system and changing data. You will need to be a Jira administrator with the 'Browse Project's permission for all projects.
Navigate to Administration > System > Atlassian Support Tools > Health Checks, and check the following:
Database and index consistency | From the Indexing section of the health checks.
Verify that the item count and updated date lie within your organization's RPO. |
---|---|
Attachments | From the Attachments section of the health checks.
If the check does not work, you can manually determine the recovery point, as follows:
|
Returning to the primary instance
In most cases, you will want to return to using your primary instance, after you have resolved the problems that caused the disaster. This is easiest to achieve if you can schedule a reasonably-sized outage window.
You need to:
- Synchronize your primary database with the state of the secondary.
- Synchronize the primary attachment directory with the state of the secondary.
- Recover the index state on the primary server.
Preparation
Attachments and other files |
|
---|---|
Search index | Enable Index snapshots on the standby (running) instance so that you have a recent index snapshot. This should be copied to a location that is accessible from the primary instance. |
Perform the cut over
- Shut down Jira on the standby instance.
- Ensure the database is synchronized correctly and configured to as required.
- Start Jira.
- Log in to Jira and restore the index from the index snapshot. You will need to know the name and location of the snapshot file.
- Check that Jira is operating as expected.
- Update your DNS, HTTP Proxy or other front end devices to route traffic to your primary server.
Other resources
Atlassian Experts
Jira Data Center is the only Atlassian-supported high availability solution for Jira. However, if you don't choose Jira Data Center, our Experts may be able to help implementing a Disaster Recovery Plan for your environment, contact our Experts team.
Atlassian Answers
Our community and staff are active on Atlassian Answers. Feel free to contribute your best practices, questions and comments. Here are some of the answers relevant to this page:
Troubleshooting
If you encounter problems after failing over to your standby instance, the following FAQs may help:
Definitions
1 - Definitions
RPO | Recovery Point Objective | How up-to-date you require your Jira instance to be after a failure. |
RTO | Recovery Time Objective | How quickly you require your standby system to be available after a failure. |
RCO | Recovery Cost Objective | How much you are willing to spend on your disaster recovery solution. |