Slow Reindexing due to java.io.UnixFileSystem.delete Operations

Platform notice: Server and Data Center only. This article only applies to Atlassian products on the Server and Data Center platforms.

Support for Server* products ended on February 15th 2024. If you are running a Server product, you can visit the Atlassian Server end of support announcement to review your migration options.

*Except Fisheye and Crucible

Problem

Jira indexing is progressing very slowly with most IssueIndexer threads running java.io.UnixFileSystem.delete0 methods. A reindex that should take a couple of hours can take more than a day to complete. 

The following appears in the thread dumps with little to no progression over a series of thread dumps:

"IssueIndexer:thread-XX" #420370 prio=5 os_prio=0 tid=0x00007fa2a8145800 nid=0xf7b6 runnable [0x00007fa0a65e4000]
   java.lang.Thread.State: RUNNABLE
	at java.io.UnixFileSystem.delete0(Native Method)
	at java.io.UnixFileSystem.delete(UnixFileSystem.java:265)
	at java.io.File.delete(File.java:1041)
	at org.apache.lucene.store.FSDirectory.deleteFile(FSDirectory.java:295)
	at org.apache.lucene.index.IndexFileDeleter.deleteFile(IndexFileDeleter.java:542)
	at org.apache.lucene.index.IndexFileDeleter.deleteNewFiles(IndexFileDeleter.java:531)
	at org.apache.lucene.index.DocumentsWriter.flush(DocumentsWriter.java:605)
	- locked <0x000000050d2a7e60> (a org.apache.lucene.index.DocumentsWriter)
	at org.apache.lucene.index.IndexWriter.doFlush(IndexWriter.java:3543)
	- locked <0x000000050d24b398> (a org.apache.lucene.index.IndexWriter)
	at org.apache.lucene.index.IndexWriter.flush(IndexWriter.java:3508)
	at org.apache.lucene.index.IndexWriter.prepareCommit(IndexWriter.java:3371)
	at org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:3444)
	- locked <0x000000050d24b688> (a java.lang.Object)
	at org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:3426)
	at org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:3410)
	at com.atlassian.jira.index.WriterWrapper.commit(WriterWrapper.java:115)
	at com.atlassian.jira.index.DefaultIndexEngine$WriterReference.commit(DefaultIndexEngine.java:239)
	at com.atlassian.jira.index.DefaultIndexEngine$FlushPolicy$2.commit(DefaultIndexEngine.java:69)
	at com.atlassian.jira.index.DefaultIndexEngine$FlushPolicy.perform(DefaultIndexEngine.java:124)
	at com.atlassian.jira.index.DefaultIndexEngine.write(DefaultIndexEngine.java:184)
	at com.atlassian.jira.index.DefaultIndex.perform(DefaultIndex.java:29)
	at com.atlassian.jira.issue.index.DefaultIssueIndexer$IndexIssuesOperation.perform(DefaultIssueIndexer.java:408)
	at com.atlassian.jira.issue.index.DefaultIssueIndexer$IndexIssuesOperation.perform(DefaultIssueIndexer.java:393)
	at com.atlassian.jira.issue.index.DefaultIssueIndexer.lambda$null$2(DefaultIssueIndexer.java:332)
	at com.atlassian.jira.issue.index.DefaultIssueIndexer$$Lambda$3191/1326474058.get(Unknown Source)
	at com.atlassian.jira.index.SimpleIndexingStrategy.get(SimpleIndexingStrategy.java:7)
	at com.atlassian.jira.index.SimpleIndexingStrategy.get(SimpleIndexingStrategy.java:5)
	at com.atlassian.jira.index.MultiThreadedIndexingStrategy$1.call(MultiThreadedIndexingStrategy.java:33)
	at com.atlassian.jira.index.MultiThreadedIndexingStrategy$1.call(MultiThreadedIndexingStrategy.java:31)
	at com.atlassian.jira.util.concurrent.BoundedExecutor$2.call(BoundedExecutor.java:68)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)

Diagnosis

Environment

  • This can happen in any Operating System but has been shown to be more predominant in AWS where IOPS can be controlled

Diagnostic Steps

  • Run the Test disk access speed for a Java application diagnostic test a few times while the reindexing is happening. The delete operations will be very slow when compared to the suggested timings on the guide:




    (info) Some of the tests will perform good, so it's essential to run the test at least 3 times since the delete I/O operations tend to be inconsistent and can result in false positives.

  • If you want to test this outside of Java itself, you can use the following (or similar) script to test delete timings from the OS itself. This will create 50k files in the directory of your choice, then we can time deleting them. Notice how the delete operations take a long time to delete 50k files when it should take a fraction of that time. 

    (warning) Consult with your System Administrator first before using this option in order to ensure proper syntax and avoid unintended performance issues that may happen during the test. Note that this can be executed from the /var/tmp/ directory since the issue tends to affect all directories from the same EBS if this is an AWS IOPS issue. 

    cd /var/tmp/
    for i in $(seq 1 50000); do echo testing >> $i.txt; done
    time rm -f *txt


Cause

This is caused by slow file delete operations at the OS level. We tend to see this more with AWS since the EBS options give admins control over IOPS, which can bottleneck disk operations. 

Workaround

Work with your AWS administrator to increase the IOPS of the EBS JIRA is using. We do not have a baseline since every environment and index is different, so we recommend to increase it progressively and test using the diagnostic steps above until you have a better file delete performance. 


DescriptionJira indexing is progressing very slowly with most IssueIndexer threads running UnixFileSystem.delete0 methods. A reindex that should take a couple of hours can take more than a day to complete. 
ProductJira
Last modified on Dec 2, 2019

Was this helpful?

Yes
No
Provide feedback about this article
Powered by Confluence and Scroll Viewport.