Alternative disaster recovery guide for Jira

This guide shows you how to set up an alternative disaster recovery solution for Jira Data Center 6.4 and later.

This solution is not supported by Atlassian as explained here.

The only Atlassian-supported disaster recovery solution for Jira is this one described in this guide: Disaster Recovery Guide for Jira.

A disaster recovery strategy is a key part of any business continuity plan. It covers the processes that should be followed in the event of a disaster, to ensure that the business can recover and keep operating. For Jira, this means ensuring Jira's availability in the event of your primary site becoming unavailable.

On this page:

Overview
Setting up a standby system
Disaster recovery testing
Handling a failover
Clustering considerations
Returning to the primary instance
Other resources

Overview

The guide describes what is generally referred to as a "cold standby" strategy. That means that the standby Jira instance is not continuously running and that some administrative steps need to be taken to start the standby instance and ensure it is in a suitable state to service the business needs of the organization.

The major components that need to be considered in the disaster recovery plan are:

Jira installation	The standby site should have the exact same version on Jira Data Center installed as the production site.
Database	This is the primary source of truth for Jira and contains most of the Jira data, (except for attachments, avatars, installed plugins, etc). The database needs to be replicated and continuously kept up to date to satisfy your RPO¹
Attachments	All issue attachments are stored in the local file system and need to be replicated to the standby instance.
Search Index	The search index is not a primary source of truth and can always be recreated from the database, however for large installations this can be quite time consuming and the functionality of Jira would be greatly reduced until the index is fully recovered. Jira provides tools for reducing this recovery time to the bare minimum.
Plugins	User installed plugins are stored in the local file system and need to be replicated to the standby instance.
Other data	There are a few other non-critical items that should also be replicated to the standby instance such as User and Project avatars.

Setting up a standby system

Step 1. Install Jira Data Center

Install the same version of Jira on standby system. Configure the system to attach to the standby database.

You also need to configure the instance to be a disaster recovery installation. This enables the automatic index recovery mechanism to kick in when Jira starts.

Add the following to jira-config.properties in the Jira home directory of the standby instance:

disaster.recovery=true

DO NOT start the standby Jira system

Starting Jira would write data to the database, which you do not want to do.

You may like to test the installation by temporarily connecting it to a different database and starting Jira, then making sure it works as expected. Don't forget to update the database configuration to point to the standby database after your testing.

Step 2. Implement a data replication strategy

Replicating data to your standby location is a crucial to a cold standby failover strategy. You don't want to fail over to your standby Jira instance and find that it is out of date or that it takes a few hours to reindex.

Manage data replication via external tools, as described below:

Database	Atlassian does not provide or recommend a particular strategy for replicating the database. All of the supported database suppliers -- that is, Oracle, PostgreSql, MySql and Microsoft SQLServer – provide their own database replication solutions: Oracle: Oracle Database Data Replication and Integration PostgreSQL: PostgreSQL Binary Replication Tutorial MySQL: MySQL reference manual - Chapter 17 Replication Microsoft SQL Server: SQL Server Replication You need to implement a database replication strategy that meets your RPO¹ and RCO¹.
Attachments	There are a number of possibilities for managing attachments for disaster recovery: Have Jira replicate the attachments to a secondary location. You can: Use a file system mapping which you need to use operating system level tools to map that location to the remote standby location, using NFS, CIFS or some other mechanism, Use a plugin with a defined storage, or Create your own plugin that implements the SimpleAttachmentStore of Jira Change the attachment location in Jira. To do this: Create a symbolic link on your server from the default attachment path ( `<JIRA_HOME>/data/attachments` ) to the path where you want your attachments to be stored. Creating a symbolic link on Linux Run the below command on the system's command line. `ln -s /path/to/file /path/to/symlink` Creating a symbolic link on Windows Run the below command on the system's command line. `mklink /J C:\path\to\symlink E:\path\to\file` Copy the existing attachments to the new path while creating the link. Shutdown your Jira application before making the path changes. Use an attachment store that provides its own DR: Jira provides an entry point for customers to add different types of storage as a primary storage location (Amazon's S3 , Google Drive, etc). This can be either the sole attachments storage or a secondary backup storage. Use file system or operating system tools to replicate the attachments: If you are already using a corporate SAN or similar system that provides this functionality that may be the easiest, most cost effective and reliable way to replicate the attachments.
Search indexes	The steps to put the search index into a state that meets your RTO¹ objective are: Enable index recovery on the live instance: This will take a consistent snapshot of the search index periodically. The frequency of this will affect how long it takes to recover the full index on the standby after the failover, but even with a frequency of 24 hours the amount to be recovered will be at most one days indexing which would be typically < 1% of the index and take only a very short time to recover, for example if a full re-index takes 5 hours then the recovery would be expected to only be about 5 minutes. Copy the index snapshots to the standby instance: The snapshots which are saved to `<yourjirahome>/export/indexsnapshots.` The snapshots should be copied to `<yourjirahome>/import/indexsnapshots` on the standby server. Jira does not provide a mechanism to copy these files. You need to set up a regular job to do a file system copy. You should retain at least the last 2 snapshots on the standby server. Ensure that the standby server is a disaster recovery installation — See Installing Jira above.
Plugins	Installed plugins are kept in the `<yourjirahome>/plugins/installed-plugins` directory. This directory on the standby instance should be kept in sync with that on the live instance. You need to set up a regular job to do this at the file system level.
Other data	You should also periodically replicate the content of the `<yourjirahome>/data/avatars` directory. If you have non Atlassian plugins, they may write some data to your `<yourjirahome>` directory. You will need to contact you plugin supplier to determine if this data should be replicated to the standby server.

Disaster recovery testing

You should exercise extreme care when testing any disaster recovery plan. Simple mistakes may cause your live instance to be corrupted, for example, if testing updates are inserted into your production database. You may detrimentally impact your ability to recover from a real disaster, while testing your disaster recovery plan.

The key is to keep the main data center as isolated as possible from the disaster recovery testing.

Prerequisites

Before you perform any testing, you need to isolate your production data:

Database

Temporarily pause all replication to the standby database.
Replicate the data from the standby database to another database that is isolated and with no communication with the main database.

Attachments, plugins and indexes

You need to ensure that no plugin updates or index backups occur during the test:

Disable index backups.
Instruct sysadmins to not perform any updates in Jira.

Note, attachments should not cause any kind of problem, healthchecks in the failover instance are going to give enough information if the folders have the write permissions.

Installation folders

Clone your standby installation, separate from both the live and standby instances.
Change the connection to the database in the JIRA_HOME/dbconfig.xml to avoid any conflict.

After this, you can resume all replication to the standby instance, including the database.

Performing the disaster recovery testing

Once you have isolated your production data, follow the steps below to test your disaster recovery plan:

Ensure that the new database is ready, with the latest snapshot and no replication.
Ensure that you have a copy of Jira on a clean server with the proper dbconfig.xml connection.
Ensure that you have JIRA_HOME mapped as it was in the standby instance, but in the test server. It is important to have the latest snapshot in JIRA_HOME/export folder.
Disable email.
Start Jira in Disaster Recovery mode, by starting it with the following parameter: disaster.recovery=true.

Handling a failover

In the event of your primary site becoming unavailable, you will need to fail over to your standby system. This section describes how to do this, including instructions on how to check the data in your standby system.

Step 1. Fail over to the standby instance

The basic steps to failover to the standby instance are:

Ensure your live system is shutdown and no longer updating the database.
Ensure that the directory <yourjirahome>/old does not exist on the standby instance.
Perform whatever steps are required to activate your standby database.
Start Jira in the standby instance.
Wait for Jira to start and check it is operating as expected.
Update your DNS, HTTP Proxy or other front end devices to route traffic to your standby server.

You should check the log, <yourjirahome>/log/atlassian-jira.log after Jira starts for information regarding the recovery state.

Step 2. Check the data in your standby instance

After you have failed over to your standby instance, perform these checks before users start accessing the system and changing data:

Check	Instructions
Latest issue update recorded in the database.	In the database, run the SQL query: SELECT max(updated) from jiraissue;
Latest issue update recorded in the search index.	In Jira, go to Issues > Search for issues and run the JQL: order by updated desc
Check the total number of issues	In the database, run the SQL query: SELECT count(*) from jiraissue;
Check the total number of issues in the search index	In Jira, go to Issues > Search for issues and run a search with an empty query.

Clustering considerations

If you have a clustered environment, you need to be aware of the following, in addition to the information above:

Standby cluster

If you have a standby cluster, the node ids of the standby nodes must be different from those of the live cluster.

There is no need for the configuration of the standby cluster to reflect that of the live cluster, it may contain more or fewer nodes, depending upon your requirements and budget. Fewer nodes may result in lower throughput but that may be acceptable depending upon your circumstances.

File locations Where we mention <yourjirahome> for the location of files that need to be synchronized will be the shared home for the cluster.

Starting the standby cluster It is important to initially start only one node of the cluster, allow it to recover the search index and check it is working correctly before starting additional nodes.

Returning to the primary instance

In most cases, you will want to return to using your primary instance, after you have resolved the problems that caused the disaster. This is easiest to achieve if you can schedule a reasonably-sized outage window.

You need to:

Synchronize your primary database with the state of the secondary.
Synchronize the primary attachment directory with the state of the secondary.
Recover the index state on the primary server.

Preparation

Attachments and other files	Use rsync or a similar uililty to synchronize the majority of attachments to the primary server before starting the switchover process. Similarly, you should synchronize the installed plugins and logos before you start.
Search index	Enable Index snapshots on the standby (running) node so that you have a recent index snapshot. This should be copied to a location that is accessible from the live node.

Perform the cut over

Shutdown Jira on the standby node.
Ensure the database is synchronized correctly and configured to as required.
Start Jira.
Log in to Jira and restore the index from the index snapshot. You will need to know the name and location of the snapshot file.
Check that Jira is operating as expected.
Update your DNS, HTTP Proxy or other front end devices to route traffic to your primary server.

Other resources

Atlassian Experts

Jira Data Center documentation is the only Atlassian-supported disaster recovery solution for Jira. However, if you cannot get Jira Data Center, many of our Experts have been implementing disaster recovery solutions for Jira for years.

To get help implementing a disaster recovery solution for your environment, contact our Experts team.

Atlassian Answers

Our community and staff are active on Atlassian Answers. Feel free to contribute your best practices, questions and comments. Here are some of the answers relevant to this page:

Troubleshooting

If you encounter problems after failing over to your standby instance, the following FAQs may help:

What do I do if my database is not synchronized correctly?

If the database does not have the data available that it should, then you will need to restore the database from a backup.

Once you have restored the database, the search index will no longer by in sync with the database. You can either do a full re-index, background or foreground, or recover from the latest index snapshot if you have one. The index snapshot can be older or more recent than your database backup, it will synchronize itself as part of the recovery process.

What do I do if my search index is corrupt?

If the search index is corrupt, you can either do a full re-index, background or foreground, or recover from an earlier index snapshot if you have one.

What do I do if attachments are missing?

You may be able to recover them from backups if you have them, or recover from the primary site, if you have access to the hard drives. Tools such as rsync may be useful in such circumstances. Missing attachments will not stop Jira performing normally: the missing attachments will just not be available, so users may be able to upload them again.

Definitions

1 - Definitions

RPO	Recovery Point Objective	How up-to-date you require your Jira instance to be after a failure.
RTO	Recovery Time Objective	How quickly you require your standby system to be available after a failure.
RCO	Recovery Cost Objective	How much you are willing to spend on your disaster recovery solution.

Products

Jira Software

Jira Service Management

Jira Work Management

Confluence

Bitbucket

Resources

Documentation

Community

System Status

Suggestions and bugs

Marketplace

Billing and licensing

Alternative disaster recovery guide for Jira

Recovery communication in Jira

On this page

Still need help?

Overview

Setting up a standby system

Step 1. Install Jira Data Center

Step 2. Implement a data replication strategy

Disaster recovery testing

Prerequisites

Performing the disaster recovery testing

Handling a failover

Step 1. Fail over to the standby instance

Step 2. Check the data in your standby instance

Clustering considerations

Returning to the primary instance

Preparation

Perform the cut over

Other resources

Troubleshooting

Definitions

Page

Viewport

Confluence

Alternative disaster recovery guide for Jira

Recovery communication in Jira

On this page

Related content

Still need help?

Overview

Setting up a standby system

Step 1. Install Jira Data Center

Step 2. Implement a data replication strategy

Disaster recovery testing

Prerequisites

Performing the disaster recovery testing

Handling a failover

Step 1. Fail over to the standby instance

Step 2. Check the data in your standby instance

Clustering considerations

Returning to the primary instance

Preparation

Perform the cut over

Other resources

Troubleshooting

Definitions

Related content