Alternative Disaster Recovery Guide for Jira
Overview
The guide describes what is generally referred to as a "cold standby" strategy. That means that the standby Jira instance is not continuously running and that some administrative steps need to be taken to start the standby instance and ensure it is in a suitable state to service the business needs of the organization.
The major components that need to be considered in the disaster recovery plan are:
Jira installation | The standby site should have the exact same version on Jira Data Center installed as the production site. |
---|---|
Database | This is the primary source of truth for Jira and contains most of the Jira data, (except for attachments, avatars, installed plugins, etc). The database needs to be replicated and continuously kept up to date to satisfy your RPO1 |
Attachments | All issue attachments are stored in the local file system and need to be replicated to the standby instance. |
Search Index | The search index is not a primary source of truth and can always be recreated from the database, however for large installations this can be quite time consuming and the functionality of Jira would be greatly reduced until the index is fully recovered. Jira provides tools for reducing this recovery time to the bare minimum. |
Plugins | User installed plugins are stored in the local file system and need to be replicated to the standby instance. |
Other data | There are a few other non-critical items that should also be replicated to the standby instance such as User and Project avatars. |
Setting up a standby system
Step 1. Install Jira Data Center
Install the same version of Jira on standby system. Configure the system to attach to the standby database.
You also need to configure the instance to be a disaster recovery installation. This enables the automatic index recovery mechanism to kick in when Jira starts.
Add the following to jira-config.properties
in the Jira home directory of the standby instance:
disaster.recovery=true
DO NOT start the standby Jira system
Starting Jira would write data to the database, which you do not want to do.
You may like to test the installation by temporarily connecting it to a different database and starting Jira, then making sure it works as expected. Don't forget to update the database configuration to point to the standby database after your testing.
Step 2. Implement a data replication strategy
Replicating data to your standby location is a crucial to a cold standby failover strategy. You don't want to fail over to your standby Jira instance and find that it is out of date or that it takes a few hours to reindex.
Manage data replication via external tools, as described below:
Database | Atlassian does not provide or recommend a particular strategy for replicating the database. All of the supported database suppliers -- that is, Oracle, PostgreSql, MySql and Microsoft SQLServer – provide their own database replication solutions:
You need to implement a database replication strategy that meets your RPO1 and RCO1 . |
---|---|
Attachments | There are a number of possibilities for managing attachments for disaster recovery:
|
Search indexes | The steps to put the search index into a state that meets your RTO 1 objective are:
|
Plugins | Installed plugins are kept in the <yourjirahome directory. This directory on the standby instance should be kept in sync with that on the live instance. You need to set up a regular job to do this at the file system level. |
Other data | You should also periodically replicate the content of the If you have non Atlassian plugins, they may write some data to your |
Disaster recovery testing
You should exercise extreme care when testing any disaster recovery plan. Simple mistakes may cause your live instance to be corrupted, for example, if testing updates are inserted into your production database. You may detrimentally impact your ability to recover from a real disaster, while testing your disaster recovery plan.
The key is to keep the main data center as isolated as possible from the disaster recovery testing.
Prerequisites
Before you perform any testing, you need to isolate your production data:
Database |
|
---|---|
Attachments, plugins and indexes | You need to ensure that no plugin updates or index backups occur during the test:
Note, attachments should not cause any kind of problem, healthchecks in the failover instance are going to give enough information if the folders have the write permissions. |
Installation folders |
|
After this, you can resume all replication to the standby instance, including the database.
Performing the disaster recovery testing
Once you have isolated your production data, follow the steps below to test your disaster recovery plan:
- Ensure that the new database is ready, with the latest snapshot and no replication.
- Ensure that you have a copy of Jira on a clean server with the proper
dbconfig.xml
connection. - Ensure that you have
JIRA_HOME
mapped as it was in the standby instance, but in the test server. It is important to have the latest snapshot inJIRA_HOME/export folder
. - Disable email.
- Start Jira in Disaster Recovery mode, by starting it with the following parameter: disaster.recovery=true.
Handling a failover
In the event of your primary site becoming unavailable, you will need to fail over to your standby system. This section describes how to do this, including instructions on how to check the data in your standby system.
Step 1. Fail over to the standby instance
The basic steps to failover to the standby instance are:
- Ensure your live system is shutdown and no longer updating the database.
- Ensure that the directory
<yourjirahome
does not exist on the standby instance.>/old
- Perform whatever steps are required to activate your standby database.
- Start Jira in the standby instance.
- Wait for Jira to start and check it is operating as expected.
- Update your DNS, HTTP Proxy or other front end devices to route traffic to your standby server.
You should check the log, <yourjirahome
after Jira starts for information regarding the recovery state.>/log/atlassian-jira.log
Step 2. Check the data in your standby instance
After you have failed over to your standby instance, perform these checks before users start accessing the system and changing data:
Check | Instructions |
---|---|
Latest issue update recorded in the database. | In the database, run the SQL query: SELECT max(updated) from jiraissue; |
Latest issue update recorded in the search index. | In Jira, go to Issues > Search for issues and run the JQL: order by updated desc |
Check the total number of issues | In the database, run the SQL query: SELECT count(*) from jiraissue; |
Check the total number of issues in the search index | In Jira, go to Issues > Search for issues and run a search with an empty query. |
Clustering considerations
If you have a clustered environment, you need to be aware of the following, in addition to the information above:
Standby cluster | If you have a standby cluster, the node ids of the standby nodes must be different from those of the live cluster. There is no need for the configuration of the standby cluster to reflect that of the live cluster, it may contain more or fewer nodes, depending upon your requirements and budget. Fewer nodes may result in lower throughput but that may be acceptable depending upon your circumstances. |
---|---|
File locations | Where we mention <yourjirahome for the location of files that need to be synchronized will be the shared home for the cluster. |
Starting the standby cluster | It is important to initially start only one node of the cluster, allow it to recover the search index and check it is working correctly before starting additional nodes. |
Returning to the primary instance
In most cases, you will want to return to using your primary instance, after you have resolved the problems that caused the disaster. This is easiest to achieve if you can schedule a reasonably-sized outage window.
You need to:
- Synchronize your primary database with the state of the secondary.
- Synchronize the primary attachment directory with the state of the secondary.
- Recover the index state on the primary server.
Preparation
Attachments and other files |
|
---|---|
Search index | Enable Index snapshots on the standby (running) node so that you have a recent index snapshot. This should be copied to a location that is accessible from the live node. |
Perform the cut over
- Shutdown Jira on the standby node.
- Ensure the database is synchronized correctly and configured to as required.
- Start Jira.
- Log in to Jira and restore the index from the index snapshot. You will need to know the name and location of the snapshot file.
- Check that Jira is operating as expected.
- Update your DNS, HTTP Proxy or other front end devices to route traffic to your primary server.
Other resources
Atlassian Experts
Jira Data Center documentation is the only Atlassian-supported disaster recovery solution for Jira. However, if you cannot get Jira Data Center, many of our Experts have been implementing disaster recovery solutions for Jira for years.
To get help implementing a disaster recovery solution for your environment, contact our Experts team.
Atlassian Answers
Our community and staff are active on Atlassian Answers. Feel free to contribute your best practices, questions and comments. Here are some of the answers relevant to this page:
Troubleshooting
If you encounter problems after failing over to your standby instance, the following FAQs may help:
Definitions
1 - Definitions
RPO | Recovery Point Objective | How up-to-date you require your Jira instance to be after a failure. |
RTO | Recovery Time Objective | How quickly you require your standby system to be available after a failure. |
RCO | Recovery Cost Objective | How much you are willing to spend on your disaster recovery solution. |