JIRA Software Data Center Lexorank Indexing Lag

Still need help?

The Atlassian Community is here for you.

Ask the community

This Knowledge Base article was written specifically for JIRA Software Data Center. It does not apply to Server or Cloud installations.

Problem

In JIRA Software Data Center, LexoRank Rebalance operations can cause performance degradation if cluster nodes have insufficient I/O or CPU capacity. This effect is magnified in comparison to a Server installation due to the need for each issue edit to be indexed on every cluster in the node. 

  • Administrators may see an Instance Health warning or error. Example: "Index replication for cluster node 'node3' is behind by 2,991 seconds."
  • JIRA users may report that issue changes made by one user are not visible in searches or dashboards for another user.

The public bug for tracking this issue is  JRA-63842 - Getting issue details... STATUS . Note that some customers have experienced index consistency lag due to other causes but the KB below is useful for detecting and addressing the primary environmental drivers.

If you are currently planning and testing for an upgrade to JIRA Software 7.2+, we recommend that you continue your testing. Hold on deploying to Production until you have tested and resolved environmental bottlenecks and add a watch to this page and the bug report above to receive notice of a future maintenance release which will contain additional instrumentation and/or mitigation.

Diagnosis

Environment

This problem most commonly affects JIRA Software Data Center customers with:

  • More than 500k issues.
  • More than 750 custom fields.
  • Calculated fields.

If you observe Instance Health warnings and reports from users of inconsistency between nodes primarily when a Lexorank Rebalance is happening then you may be affected by this problem.

If the index drift recovers very very slowly after Lexorank Rebalance is done, the problem is more likely to be a general problem with your issue indexing speed.

Either way, the Resolution section below has detailed information about checking and fixing the common bottlenecks for indexing speed.

Cause

  • LexoRank Rebalancing requires an update to the Rank field followed by issue reindexing for every issue in the system.
  • The LexoRank Rebalance job generates lots of issue updates very fast which must be repeated for every node in the Data Center cluster.
  • Reindexing an issue requires reading all fields, custom fields, and comments from the database for storage in the Lucene issue index.
  • This causes read and write amplification as all nodes must read the data for every issue and record their reindexing status in the DB.
  • This additional load on the disks, DB, CPU, and network causes resource saturation that affects all other operations.

Workarounds

  • Reduce number of nodes in the cluster. Testing reveals diminishing returns in performance in clusters larger than 4 nodes.
  • Avoid closing a sprint with > 200 issues in an unresolved status. This requires a new Rank value for all these issues and can trigger a Lexorank rebalance.
  • If you are planning to import hundreds of issues, delay this until you have tested and resolved performance bottlenecks. A Rank must be generated for every issue and this can trigger a Lexorank rebalance.
  • Leave only one node in LB to prevent serving stale data from other nodes. This negates the availability value of Data Center so is considered a last resort. This also requires that each node is capable of handling the full concurrent user traffic for your organization, as is the best practice for an HA cluster.

Resolution

The JIRA Enterprise development team prepared multiple improvements to mitigate the problem.

LexoRank improvements in JIRA Software 7.2.8

In JIRA Software 7.2.8 a number of improvements to both LexoRank algorithm and LexoRank balancing were implemented in order to mitigate the problem. Those changes can be grouped into two major categories. Firstly, we have worked on reducing the need of LexoRank balancing being triggered. Secondly, a number of improvements to LexoRank balancing have been implemented that reduce the impact of it running in a JIRA cluster.

Reducing need for LexoRank balancing

The main cause of LexoRank balancing is reranking a large group of issues together. In JIRA Software 7.2.8 group ranking of multiple issues is much more optimal and will no longer trigger LexoRank balancing ( JSW-15710 - Getting issue details... STATUS ). In previous versions, it was enough to bulk move ~210 issues from sprint to backlog to trigger LexoRank balancing. With JIRA Software 7.2.8 during a single bulk move, we are distributing issues equally between two values, so there is no significant LexoRank length change. That said it does not mean that balancing will not be triggered anymore. We implemented some tools to make sure balancing can be managed by JIRA administrators in case it is still causing problems.

Reducing impact of LexoRank balancing on JIRA Data Center cluster

Since JIRA Software 7.2.8 LexoRank balancing will wait until indexes on all live nodes in the cluster are up to date ( JSW-15703 - Getting issue details... STATUS ). This should prevent LexoRank balancing from causing index replication lag. Note however that this means that LexoRank balancing will take more time than before if you're cluster has slow indexing times. There are two factors to balancing speed: speed of reranking mostly bound by the speed of accessing LexoRank table in DB (this was greatly improved in JIRA Software 7.2.0 and should no longer be a bottleneck) and speed of indexing. Basically, LexoRank balancing speed is bound by whichever of those two is slower. Long running LexoRank balancing operations should not be a problem as they do not prevent JIRA from being used and are designed to work in the background for as long as it is needed. In previous versions, if a LexoRank value reached its maximum length it would cause Agile to block all ranking operations on boards until balancing finished. This is no longer the case for 7.2.8 so even if one LexoRank value reaches maximum length all Agile boards and backlogs will continue to be fully usable except for that one value ( JSW-15712 - Getting issue details... STATUS ).

In case these improvements are not enough to mitigate LexoRank balancing's negative impact on the cluster performance, the ability to suspend LexoRank balancing entirely has been added in JIRA Software 7.2.8. This can be used by JIRA administrators and is described in the knowledge base article. 

Mitigation of Environment Bottlenecks

Disk Speed

  • Perform a Disk Speed Test.

  • Mitigation: If the storage used for your Local Home does not test in the "excellent" range, provision faster storage with an improved performance profile.

Database Speed

  • Perform a Database Speed Test. This test is not as robust as our disk speed test but it is an easy "early warning" method. 
    • The command to use it is complex, you must fill in all values for your environment. These can be found in {{<JIRA_HOME_DIR>/dbconfig.xml}}.  Add the downloaded jar location, your JDBC driver jar location, your DB username and DB password.
    • We don't have clearly defined "excellent, good, bad" ranges but in an enterprise production environment we look for mean values well below 20ms for all metrics and below 10ms is ideal.
  • Mitigation: Check that your DB configuration is correct as per the JIRA documentationSeek assistance from your database administrators to measure database performance with greater accuracy. DB latency for a DC cluster should be < 1ms.

Calculated Custom Fields

  • Many third party add-ons enable creation of fields that calculate their value from JQL, SQL, or other queries. These can dramatically increase indexing time for each issue because the value must be calculated as part of reindexing.
  • Audit your custom fields and the configured calculation for any of this type.
  • Mitigation: Review calculated custom fields with your teams. Limit the custom field context for calculated fields to the specific projects where they are needed. Review the calculations to determine if they can be optimized. Determine if the value of an in-product calculated field outweighs the performance penalty. If not, consider external reporting solutions to replace calculated fields. 

We have created a custom build of the Lucidity Add-On which adds a module to the measure performance impact of indexing custom fields. If you believe calculated fields are driving poor indexing speed please raise a ticket at http://support.atlassian.com so we can guide you through installation of the add-on custom build and collecting data.

Large Numbers of Custom Fields

  • When each issue is indexed all of its custom field values must be retrieved and custom field values are stored across multiple tables resulting in complex joins. Installations with > 700 custom fields are likely to experience issue index speed degradation.
  • Mitigation: Review custom fields with your teams. Limit the custom field context to the specific projects where that field is needed. Check for duplicate JIRA Software / JIRA Agile fields and resolve those. Identify duplicates and low-value fields and remove them.

 

Last modified on Nov 2, 2018

Was this helpful?

Yes
No
Provide feedback about this article
Powered by Confluence and Scroll Viewport.