Confluence reindexing becomes stuck

Still need help?

The Atlassian Community is here for you.

Ask the community

Platform notice: Server and Data Center only. This article only applies to Atlassian products on the Server and Data Center platforms.

Support for Server* products ended on February 15th 2024. If you are running a Server product, you can visit the Atlassian Server end of support announcement to review your migration options.

*Except Fisheye and Crucible

Problem - Does this apply to you?

You may notice that newly added content is not being picked up by Confluence; and the number of items in the queue increases. Performing a reindex from scratch resolves the problem, however the resolution is not permanent.

Additionally, you may have looked into problems associated with the index timestamp being incorrect and found that it does not apply to you.

Diagnosis

Tracking which artifact (either a page, artifact, or some other problem) is responsible for the stuck state can be difficult, and sometimes does not produce any useful results immediately. First, we must enable debug logging for the appropriate classes involved in reindexing. This will give us debugging information as Confluence performs a reindex; allowing us to see where it gets up to:

Shut down Confluence, and edit <confluence_install_dir>/WEB-INF/classes/log4j.properties:

  1. Add the following lines at the bottom:

    log4j.logger.com.atlassian.confluence.search.lucene=DEBUG
    log4j.logger.com.atlassian.bonnie.search.extractor.BaseAttachmentContentExtractor=DEBUG
  2. Perform a rebuild of the indexes from scratch - this is to ensure Confluence begins to reindex everything. The debug logging will give us a record of what Confluence is doing during the reindexing.
  3. When the number of items in the queue appears to rise without being flushed correctly, we'll need to check the most recent Confluence Logs will contain debug logs for when the queue was last flushed, as well as which attachments were reindex.
  4. External thread dumps can also be taken from the instance while it's in a stuck state to determine what the indexer threads are currently doing; and why it's stuck.

    • This is particularly helpful if an attachment has been very difficult to read, or if the indexer is waiting on an external resource, such as a database or a file lock. Atlassian Support can help you diagnose any thread dumps gathered.

Example Debug Logs

Here's a page as it's being reindexed:

2015-08-21 15:08:46,984 DEBUG [Indexer: 1] [confluence.search.lucene.ReindexWorkBatch] indexCollection Index page: Welcome to Confluence v.1 (98310) [com.atlassian.confluence.pages.Page]

Here's an attachment as it's being reindexed:

2015-08-21 15:08:47,426 DEBUG [Indexer: 3] [confluence.search.lucene.ReindexWorkBatch] indexCollection Index Attachment: test.txt v.1 (1245337) admin [com.atlassian.confluence.pages.Attachment]
2015-08-21 15:08:47,428 DEBUG [Indexer: 3] [bonnie.search.extractor.BaseAttachmentContentExtractor] addFields Starting to index attachment: test.txt

Possible causes and Resolutions

The causes can be various, but there are a few common causes:

Database Disruption

Intermittent database disruptions might cause problems during the reindex, but resolve quickly enough that Confluence as whole is not impacted Adding a validation query should resolve any intermittent connection problems to the database.

Complex, poorly generated, or corrupted attachments

Some attachments cause problems when being read. You can disable indexing of attachments temporarily, to give the reindex a chance to complete. The debug logging will help to determine the file that was attempted to be indexed, before the process got into the stuck state. When it comes to troublesome attachments, there are a few options:

  1. Disable indexing of attachments for that type of attachment. You will not be able to search inside the contents of those attachments.
  2. Remove the troublesome attachment
  3. Download the attachment, and delete it from Confluence. Add it to a zip file, and reattach the zipped file to Confluence - this will effectively exclude that file from reindexing.

Some third party programs that generate Microsoft Office or PDF documents may not do as good a job as the first party applications. In some cases, it's possible to open the document in the most recent version of an application, and use that application to perform a "Save As", which can often fix up any missing metadata.

For example, a Word document generated from a third party reporting tool might cause less problems in Confluence if you open the file, save it as a new file in Word, and re-upload the file to Confluence. This process isn't guaranteed, and may not work for all files - but it has been known to work for some customers.

Environmental Problems

Through analysis of the thread dumps, you might find that the indexer threads are waiting on an external resource; such as a lock on a file or a database connection. This may also occur in other scheduled jobs - see Scheduled jobs may stall and fail to process if one job becomes stuck for more information.


Last modified on Nov 25, 2022

Was this helpful?

Yes
No
Provide feedback about this article
Powered by Confluence and Scroll Viewport.