Slow Reindexing due to java.io.UnixFileSystem.delete Operations
Platform notice: Server and Data Center only. This article only applies to Atlassian products on the server and data center platforms.
Jira indexing is progressing very slowly with most IssueIndexer threads running java.io.UnixFileSystem.delete0 methods. A reindex that should take a couple of hours can take more than a day to complete.
The following appears in the thread dumps with little to no progression over a series of thread dumps:
"IssueIndexer:thread-XX" #420370 prio=5 os_prio=0 tid=0x00007fa2a8145800 nid=0xf7b6 runnable [0x00007fa0a65e4000] java.lang.Thread.State: RUNNABLE at java.io.UnixFileSystem.delete0(Native Method) at java.io.UnixFileSystem.delete(UnixFileSystem.java:265) at java.io.File.delete(File.java:1041) at org.apache.lucene.store.FSDirectory.deleteFile(FSDirectory.java:295) at org.apache.lucene.index.IndexFileDeleter.deleteFile(IndexFileDeleter.java:542) at org.apache.lucene.index.IndexFileDeleter.deleteNewFiles(IndexFileDeleter.java:531) at org.apache.lucene.index.DocumentsWriter.flush(DocumentsWriter.java:605) - locked <0x000000050d2a7e60> (a org.apache.lucene.index.DocumentsWriter) at org.apache.lucene.index.IndexWriter.doFlush(IndexWriter.java:3543) - locked <0x000000050d24b398> (a org.apache.lucene.index.IndexWriter) at org.apache.lucene.index.IndexWriter.flush(IndexWriter.java:3508) at org.apache.lucene.index.IndexWriter.prepareCommit(IndexWriter.java:3371) at org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:3444) - locked <0x000000050d24b688> (a java.lang.Object) at org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:3426) at org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:3410) at com.atlassian.jira.index.WriterWrapper.commit(WriterWrapper.java:115) at com.atlassian.jira.index.DefaultIndexEngine$WriterReference.commit(DefaultIndexEngine.java:239) at com.atlassian.jira.index.DefaultIndexEngine$FlushPolicy$2.commit(DefaultIndexEngine.java:69) at com.atlassian.jira.index.DefaultIndexEngine$FlushPolicy.perform(DefaultIndexEngine.java:124) at com.atlassian.jira.index.DefaultIndexEngine.write(DefaultIndexEngine.java:184) at com.atlassian.jira.index.DefaultIndex.perform(DefaultIndex.java:29) at com.atlassian.jira.issue.index.DefaultIssueIndexer$IndexIssuesOperation.perform(DefaultIssueIndexer.java:408) at com.atlassian.jira.issue.index.DefaultIssueIndexer$IndexIssuesOperation.perform(DefaultIssueIndexer.java:393) at com.atlassian.jira.issue.index.DefaultIssueIndexer.lambda$null$2(DefaultIssueIndexer.java:332) at com.atlassian.jira.issue.index.DefaultIssueIndexer$$Lambda$3191/1326474058.get(Unknown Source) at com.atlassian.jira.index.SimpleIndexingStrategy.get(SimpleIndexingStrategy.java:7) at com.atlassian.jira.index.SimpleIndexingStrategy.get(SimpleIndexingStrategy.java:5) at com.atlassian.jira.index.MultiThreadedIndexingStrategy$1.call(MultiThreadedIndexingStrategy.java:33) at com.atlassian.jira.index.MultiThreadedIndexingStrategy$1.call(MultiThreadedIndexingStrategy.java:31) at com.atlassian.jira.util.concurrent.BoundedExecutor$2.call(BoundedExecutor.java:68) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748)
This can happen in any Operating System but has been shown to be more predominant in AWS where IOPS can be controlled
Run the Test disk access speed for a Java application diagnostic test a few times while the reindexing is happening. The delete operations will be very slow when compared to the suggested timings on the guide:
Some of the tests will perform good, so it's essential to run the test at least 3 times since the delete I/O operations tend to be inconsistent and can result in false positives.
If you want to test this outside of Java itself, you can use the following (or similar) script to test delete timings from the OS itself. This will create 50k files in the directory of your choice, then we can time deleting them. Notice how the delete operations take a long time to delete 50k files when it should take a fraction of that time.
Consult with your System Administrator first before using this option in order to ensure proper syntax and avoid unintended performance issues that may happen during the test. Note that this can be executed from the /var/tmp/ directory since the issue tends to affect all directories from the same EBS if this is an AWS IOPS issue.
cd /var/tmp/ for i in $(seq 1 50000); do echo testing >> $i.txt; done
time rm -f *txt
This is caused by slow file delete operations at the OS level. We tend to see this more with AWS since the EBS options give admins control over IOPS, which can bottleneck disk operations.
Work with your AWS administrator to increase the IOPS of the EBS JIRA is using. We do not have a baseline since every environment and index is different, so we recommend to increase it progressively and test using the diagnostic steps above until you have a better file delete performance.