How JIRA Datacenter node recovers after offline mode

Still need help?

The Atlassian Community is here for you.

Ask the community

Platform notice: Server and Data Center only. This article only applies to Atlassian products on the Server and Data Center platforms.

Support for Server* products ended on February 15th 2024. If you are running a Server product, you can visit the Atlassian Server end of support announcement to review your migration options.

*Except Fisheye and Crucible

Problem

A node from JIRA DC cluster is offline for some time - this means that the Lucene index will be outdated when the node is brought back online.

Cause

JIRA node is offline 

Resolution

  • If the outage is short enough that the node can see what indexing requests have been made in its absence, then it merely replays them to catch up. 
    Each indexing operation writes a row to the database table replicatedindexoperation. All of the nodes then look for entries in this table that were inserted by nodes other than themselves. They then apply the changes to their local Lucene index. Each node also keeps a record in nodeindexcounter of the latest operation it processed, so that next time it just needs to read new operations.
  • If a node goes offline and so has to recover more than 2880 minutes worth of changes, it can tell from the last index operation it has recorded in the nodeindexcounter and from the current operations to apply in the replicatedindexoperation table that it is a long way out of date. In this case it will request an active node to send it a full index replica (Snapshot). Whichever node picks up that request takes an index snapshot and posts a message back telling it where to find the snapshot. The snapshot is extracted, and then the replay/catch-up picks up whatever else happened during that recovery process.
  • Starting with Jira 8.19.0 we also introduced the option to fetch the index snapshot from shared on startup. For this feature to work index snapshot must be available in `export/indexsnapshots` directory of shared home. If you don't want the shared snapshots to be used, you can disable this by adding the system property com.atlassian.jira.startup.pick.indexsnapshot.from.shared and setting it to false. The system property com.atlassian.jira.startup.max.age.of.usable.index.snapshot.in.hours defines the maximum age for the snapshot. If not set, the default value will be used. Below 9.1.1 default value is 24, and since 9.1.1 it has been increased to 25. 
    NOTE: Jira will first try to get the index from the shared home, before requesting the index from another node.

Replicating indexes

When a node first joins the cluster or if it has been offline for and extended period, then it will get a copy of an up to date index from another node. To do this it:

  1. Sends a "Backup Index" to the the cluster
  2. An active node (other than the sender) will claim the message, removing it from the message queue and create a backup of the index in the shared home.
  3. The node that created the backup will then send an "Index Backed Up" message to the node requesting the backup.
  4. The requesting node will then replace its current index with backed up index.
  5. The requesting node will then reapply any changes that have occurred since the backup was requested.


Last modified on Aug 24, 2022

Was this helpful?

Yes
No
Provide feedback about this article
Powered by Confluence and Scroll Viewport.