Alternative Disaster Recovery Guide for Jira
The guide describes what is generally referred to as a "cold standby" strategy. That means that the standby Jira instance is not continuously running and that some administrative steps need to be taken to start the standby instance and ensure it is in a suitable state to service the business needs of the organization.
The major components that need to be considered in the disaster recovery plan are:
|The standby site should have the exact same version on Jira Data Center installed as the production site.
|This is the primary source of truth for Jira and contains most of the Jira data, (except for attachments, avatars, installed plugins, etc). The database needs to be replicated and continuously kept up to date to satisfy your RPO1
All issue attachments are stored in the local file system and need to be replicated to the standby instance.
|The search index is not a primary source of truth and can always be recreated from the database, however for large installations this can be quite time consuming and the functionality of Jira would be greatly reduced until the index is fully recovered. Jira provides tools for reducing this recovery time to the bare minimum.
|User installed plugins are stored in the local file system and need to be replicated to the standby instance.
|There are a few other non-critical items that should also be replicated to the standby instance such as User and Project avatars.
Setting up a standby system
Step 1. Install Jira Data Center
Install the same version of Jira on standby system. Configure the system to attach to the standby database.
You also need to configure the instance to be a disaster recovery installation. This enables the automatic index recovery mechanism to kick in when Jira starts.
Add the following to
jira-config.properties in the Jira home directory of the standby instance:
DO NOT start the standby Jira system
Starting Jira would write data to the database, which you do not want to do.
You may like to test the installation by temporarily connecting it to a different database and starting Jira, then making sure it works as expected. Don't forget to update the database configuration to point to the standby database after your testing.
Step 2. Implement a data replication strategy
Replicating data to your standby location is a crucial to a cold standby failover strategy. You don't want to fail over to your standby Jira instance and find that it is out of date or that it takes a few hours to reindex.
Manage data replication via external tools, as described below:
Atlassian does not provide or recommend a particular strategy for replicating the database. All of the supported database suppliers -- that is, Oracle, PostgreSql, MySql and Microsoft SQLServer – provide their own database replication solutions:
There are a number of possibilities for managing attachments for disaster recovery:
The steps to put the search index into a state that meets your RTO 1 objective are:
|Installed plugins are kept in the
<yourjirahome directory. This directory on the standby instance should be kept in sync with that on the live instance. You need to set up a regular job to do this at the file system level.
You should also periodically replicate the content of the
If you have non Atlassian plugins, they may write some data to your
Disaster recovery testing
You should exercise extreme care when testing any disaster recovery plan. Simple mistakes may cause your live instance to be corrupted, for example, if testing updates are inserted into your production database. You may detrimentally impact your ability to recover from a real disaster, while testing your disaster recovery plan.
The key is to keep the main data center as isolated as possible from the disaster recovery testing.
Before you perform any testing, you need to isolate your production data:
|Attachments, plugins and indexes
You need to ensure that no plugin updates or index backups occur during the test:
Note, attachments should not cause any kind of problem, healthchecks in the failover instance are going to give enough information if the folders have the write permissions.
After this, you can resume all replication to the standby instance, including the database.
Performing the disaster recovery testing
Once you have isolated your production data, follow the steps below to test your disaster recovery plan:
- Ensure that the new database is ready, with the latest snapshot and no replication.
- Ensure that you have a copy of Jira on a clean server with the proper
- Ensure that you have
JIRA_HOMEmapped as it was in the standby instance, but in the test server. It is important to have the latest snapshot in
- Disable email.
- Start Jira in Disaster Recovery mode, by starting it with the following parameter: disaster.recovery=true.
Handling a failover
In the event of your primary site becoming unavailable, you will need to fail over to your standby system. This section describes how to do this, including instructions on how to check the data in your standby system.
Step 1. Fail over to the standby instance
The basic steps to failover to the standby instance are:
- Ensure your live system is shutdown and no longer updating the database.
- Ensure that the directory
<yourjirahomedoes not exist on the standby instance.
- Perform whatever steps are required to activate your standby database.
- Start Jira in the standby instance.
- Wait for Jira to start and check it is operating as expected.
- Update your DNS, HTTP Proxy or other front end devices to route traffic to your standby server.
You should check the log,
<yourjirahomeafter Jira starts for information regarding the recovery state.
Step 2. Check the data in your standby instance
After you have failed over to your standby instance, perform these checks before users start accessing the system and changing data:
|Latest issue update recorded in the database.
In the database, run the SQL query:
SELECT max(updated) from jiraissue;
|Latest issue update recorded in the search index.
In Jira, go to Issues > Search for issues and run the JQL:
order by updated desc
|Check the total number of issues
In the database, run the SQL query:
SELECT count(*) from jiraissue;
|Check the total number of issues in the search index
In Jira, go to Issues > Search for issues and run a search with an empty query.
If you have a clustered environment, you need to be aware of the following, in addition to the information above:
If you have a standby cluster, the node ids of the standby nodes must be different from those of the live cluster.
There is no need for the configuration of the standby cluster to reflect that of the live cluster, it may contain more or fewer nodes, depending upon your requirements and budget. Fewer nodes may result in lower throughput but that may be acceptable depending upon your circumstances.
|Where we mention
<yourjirahome for the location of files that need to be synchronized will be the shared home for the cluster.
|Starting the standby cluster
|It is important to initially start only one node of the cluster, allow it to recover the search index and check it is working correctly before starting additional nodes.
Returning to the primary instance
In most cases, you will want to return to using your primary instance, after you have resolved the problems that caused the disaster. This is easiest to achieve if you can schedule a reasonably-sized outage window.
You need to:
- Synchronize your primary database with the state of the secondary.
- Synchronize the primary attachment directory with the state of the secondary.
- Recover the index state on the primary server.
|Attachments and other files
|Enable Index snapshots on the standby (running) node so that you have a recent index snapshot. This should be copied to a location that is accessible from the live node.
Perform the cut over
- Shutdown Jira on the standby node.
- Ensure the database is synchronized correctly and configured to as required.
- Start Jira.
- Log in to Jira and restore the index from the index snapshot. You will need to know the name and location of the snapshot file.
- Check that Jira is operating as expected.
- Update your DNS, HTTP Proxy or other front end devices to route traffic to your primary server.
Jira Data Center documentation is the only Atlassian-supported disaster recovery solution for Jira. However, if you cannot get Jira Data Center, many of our Experts have been implementing disaster recovery solutions for Jira for years.
To get help implementing a disaster recovery solution for your environment, contact our Experts team.
Our community and staff are active on Atlassian Answers. Feel free to contribute your best practices, questions and comments. Here are some of the answers relevant to this page:
If you encounter problems after failing over to your standby instance, the following FAQs may help:
If the database does not have the data available that it should, then you will need to restore the database from a backup.
Once you have restored the database, the search index will no longer by in sync with the database. You can either do a full re-index, background or foreground, or recover from the latest index snapshot if you have one. The index snapshot can be older or more recent than your database backup, it will synchronize itself as part of the recovery process.
If the search index is corrupt, you can either do a full re-index, background or foreground, or recover from an earlier index snapshot if you have one.
You may be able to recover them from backups if you have them, or recover from the primary site, if you have access to the hard drives. Tools such as rsync may be useful in such circumstances. Missing attachments will not stop Jira performing normally: the missing attachments will just not be available, so users may be able to upload them again.
1 - Definitions
|Recovery Point Objective
|How up-to-date you require your Jira instance to be after a failure.
|Recovery Time Objective
|How quickly you require your standby system to be available after a failure.
|Recovery Cost Objective
|How much you are willing to spend on your disaster recovery solution.