Troubleshooting performance with JIRA Stats

Still need help?

The Atlassian Community is here for you.

Ask the community

Intro

JIRA has a number of stats-logs (metrics) that can be used to troubleshoot different performance problems. The goal of the page to give an overview of metrics that can be used as a way to measure the Jira performance or give a sense of the performance of the environment (Disk/Network/DB).

Please check related KB for more details regarding types of the logs: JIRA stats logs


Note

Note about Healthy State column:  this is based on data from our internal Jira instances and provided here as a general reference


Applicable to versions:  8.13+ 

DC Specific

LOCALQ, DBR, INDEX-REPLAY logging are present only in DC version

Disk performance

Why is this important metric to check:

  • slow IO will affect user request related to Lucene search (JQL) and updates (issues Create/Update)

  • slow IO will affect node2node cache communications (slow updates of the LocalQ files)

  • slow IO will affect issues reindexing time: Full Reindex, Background, and Project reindex

Metrics to check:

StatsMetricDescriptionHealthy state
[LOCALQ] [VIA-INVALIDATION]timeToAddMillisThe time to store a cache replication message in local store before sending it to the other node; write performance of the LocalQ, mostly disk~1 ms

DB performance

Why is this important metric to check:

  • slow IO will affect user request related to issues Create/Update, viewing Single issue, generating reports

  • slow IO will affect cache population

  • slow IO will affect issues reindexing time: Full Reindex, Background, and Project reindex.

Metrics to check:

StatsMetricDescriptionHealthy state
[VERSIONING]getIssueVersionMillisTime to read the single row in the Version table, we expect this to be fast operations which include: processing + network + DB read time.~5 ms
[VERSIONING]incrementIssueVersionMillisTime to update (increment) the single row in the version table, we expect this to be fast operations which include: processing + network + DB update time.~10 ms

Internode network latency

Why is this important metric to check:

  • slow IO will affect cache replication (cluster consistency)

  • slow IO will affect cluster index replication (DBR)

Metrics to check:

StatsMetricDescriptionHealthy state
[LOCALQ] [VIA-INVALIDATION]timeToSendMillisThe time to deliver the cache replication message from current node to the destination node.~10 ms
[LOCALQ] [VIA-INVALIDATION]queueSizeSize of the cache replication Queue0
[DBR] [RECEIVER]receiveDBRMessageDelayedInMillisTime difference between generating the message and receiving it. Note that time is local to both nodes so this includes time drift between nodes. Includes: serialization/de-serialization + time spend in the LocalQ + time to send the message (RTT)~100 millis


Index reads/writes (Lucene)

Why is this important metric to check:

  • slow operations will affect user request related to issues Create/Update/Transition

  • slow operations will affect issues reindexing time: Full Reindex, Background, and Project reindex.

Metrics to check:

StatsMetricDescriptionHealthy state
See Disk performance


See DB performance


[index-writer-stats]updateDocumentsWithVersionMillisTime spent for all steps required to Conditionally add/update the index in the Lucene (Search + Updating Lucene).avg < 50 ms
[DBR] [RECEIVER]processDBRMessageUpdateWithRelatedIndexInMillisTime for the Complex object to wait in the Lucene queue to write to the index, includes: waiting in the Q + Conditional Updating Lucene.avg < 150 ms
[LOCALQ] [VIA-COPY]timeToSendMillis

Time to deliver and apply the cache replication message from current node to the destination node.

Includes: serialization/de-serialization + time to send the message (RTT) + ack from receiver + update Lucene

avg < 50 ms
[indexing-stats]flushIntervalMillis

Avg time to the flush was triggered since the last snapshot.

Metric is mostly relevant for Foreground reindex, may indicate the need to increase the Lucene Buffer.

avg > 1 sec
(Note: during foreground indexing)


Indexing performance

Why is this important metric to check:

  • slow operations will affect user request related to issues Create/Update/Transition

  • slow operations will affect issues reindexing time: Full Reindex, Background, and Project reindex.

  • slow operations will affect cluster consistency

Metrics to check:

StatsMetricDescriptionHealthy state
See Disk performance


See DB performance


See Index reads/writes (Lucene)


[indexing-stats]addIndex.avgAverage time to load the data from Custom Field providerDepends on custom filed functionality. Note that this have direct implication on indexing time so affects every request creating/updating issues, comments and full reindex time.
[INDEX-REPLAY] [STATS]timeInMillisTime of processing a batch of index operations; the replay process is run every 5secs~ 5 secs
[INDEX-REPLAY] [STATS]updateIndexInMillisTime to perform all required (after checking DBR ) index operations (Doc creation + Lucene) in specific batch~ 5 secs
[INDEX-REPLAY] [STATS]DBR effectivenessDBR effectiveness = (1 -  filterOutAlreadyIndexedAfterCounter.ISSUE.sum / filterOutAlreadyIndexedBeforeCounter.ISSUE.sum )> 90%
[INDEX-REPLAY] [STATS]numberOfRemoteOperationsNumber of processed index operations from other nodes, helps to check traffic distribution and cluster load
[INDEX-REPLAY] [STATS]numberOfLocalOperationsNumber of processed index operations from the current node, helps to check traffic distribution and cluster load




Last modified on Feb 1, 2021

Was this helpful?

Yes
No
Provide feedback about this article
Powered by Confluence and Scroll Viewport.