Jira Data Center search indexing
To provide fast searching, Jira creates an index of the text entered into issue fields. This index is either stored on the file system when using Lucene, or in an OpenSearch cluster and updated whenever issue text is added or modified. They're called Lucene or OpenSearch indexes, because they are provided by a third-party framework with that name. This page explains how indexes are managed and kept in sync in Jira Data Center.
Where are the indexes stored?
Lucene
The indexes are stored in a number of directories in the local Jira home directory under caches
. Each node in the cluster has its own set of indexes.
`-- indexes
|-- changes
|-- comments
|-- entities
| |-- portalpage
| `-- searchrequest
`-- issues
OpenSearch
In OpenSearch, all documents for Jira are stored on one Issue index. The index has an alias of issues
to enable re-indexing with minimal impact by using a staging index to rebuild the index. The production index follows the naming convention issue-atlas-<timestamp>
.
Synchronizing the indexes
Jira keeps all the copies of the indexes up to date automatically. The synchronization is not fully synchronous but aims for eventual consistency, which means that there is some delay before the index changes are seen on other nodes in the cluster.
Lucene
The indexes are synchronized continuously – each node polls for the changes once per 5 seconds. But where are these changes recorded?
Indexes and database
- Database table:
replicatedindexoperation
Each index operation writes a row to this database table. All nodes then look for entries that were written by other nodes in the cluster. After finding such changes, the nodes apply them to their local Lucene index. - Database table:
nodeindexcounter
To avoid the nodes checking for all possible operations all the time, they always record the latest processed operation in this table, so that during the next check, they only need to read new operations.
Document Based Replication
In Jira 8.12 we introduced Document Based Replication which also serves as index synchronization mechanism for Jira Data center.
OpenSearch
Issues are indexed within documents in OpenSearch. All changes are sent directly to OpenSearch from the Jira node where the change occurred. A retry mechanism adds fault tolerance for any intermittent failures when communicating with OpenSearch.
Jira runs a scheduled job, by default, every one minute across the Jira cluster, to verify that all recent updates are indexed. Jira tracks document versions in the replicatedindexoperation table and enforces versioning in OpenSearch by using its built-in document versioning.
This allows Jira to verify that the expected documents and versions exist in OpenSearch. For deletes and archives, Jira checks that the documents have been removed from the index.
Indexes and database
Database table:
replicatedindexoperation
Each index operation writes a row to this database table, including the document version. The scheduled job uses these operations to confirm that they've been applied to the OpenSearch cluster.Database table:
nodeindexcounter
To avoid checking all possible operations every time, the latest processed operation for each node is recorded in this table. During the next check, only new operations are read.
Retention period for both Lucene and OpenSearch
With the number of indexing changes, the database tables might get very big. To avoid that, we've introduced a service that runs on each node and removes messages that have been there longer than a set period of time. The default retention period is set to 2800 minutes (2 days), which works well with indexing, but you can customize it in Jira.
- In Jira, go to > System.
- In the Advanced section, select Services.
- Edit the
com.atlassian.jira.service.services.index.ReplicatedIndexCleaningService
service, and enter a new retention period.
Replicating the indexes for both Lucene and OpenSearch
When a node joins the cluster for the first time, or if it has been offline for an extended period of time, it will receive a copy of an up-to-date index from another active node instead of applying all these changes from the database. It's just simpler and more effective. The indexes are replicated in the following way:
- A node sends a "backup index request" to the cluster.
- One of the active nodes receives the request, removes it from the message queue, and creates a backup of the index in the shared home directory.
- The node that created the backup sends an "index backed up" message to the node that requested the backup.
- The requesting node replaces its current index with the backup.
- The requesting node also applies any changes that have occurred since the backup was created.
9.1 Changess
Note that in Jira 9.1 this mechanism is turned off and replaced by usage of index snapshots.
Checking the health of the indexes
Health checks
Jira Data Center provides a health check that helps you make sure the indexes are replicated without any issues. The knowledge base article describing it also contains some troubleshooting information, as well as links to specific issues related to indexes. For more info, see HealthCheck: Cluster Index Replication.
APIs for both Lucene and OpenSearch
You can also use the API to check the condition of the index on a particular node. For more info, see Get index summary.
Q&A
We've also gathered here some basic questions that we often get about the indexes.
Questions | Answer |
---|---|
I'm running low on disk space on the server that stores the shared home directory. Can I delete the indexes directory? | The deleted indexes will be replaced soon, so for the purpose of getting some extra space, there's no point in that. You can delete this directory only if the indexes on one node are corrupted, and you need a fresh copy. |
One of the nodes in the cluster has a corrupted index, how do I fix it? | The easiest way to fix this is to copy the index from one node to another. To do this:
|
Can I copy the indexes from a running Jira instance? | No, copying the indexes is unreliable, because they're being constantly updated. Such a copy would be inconsistent. Jira makes a consistent copy of the index from a snapshot in time when a new instance is added to the cluster and then applies all changes that have occurred since then. |
Which re-indexing option should I use? | If you have a multi-node Jira Data Center instance, there is no reason for you to use the Background re-index option. Rather, you should always use the Lock JIRA and rebuild index option. For more details on why, see Search indexing. |