Searching and Indexing Troubleshooting

Still need help?

The Atlassian Community is here for you.

Ask the community

The Search Index

The search index controls RSS feeds, searching, Recently Updated, the People Directory, Space Activity, and other components in Confluence.

Recommendations

The most common troubleshooting procedure is to rebuild the index, or if necessary rebuilding the index from scratch. Many bugs in the indexer have been fixed over time, and upgrading helps.

Common Indexing Problems

Typically, indexing fails due to one of the following reasons:

Index Logging Options

Purpose

For when automatic indexing is not occurring or you are getting errors related to indexing, these may be circumstances where you need to turn on debug logging for indexing.

Solution

Confluence 5.9 and below...

Enabling debugging for indexing temporarily

From Administration >> Logging and Profiling, add the following package, and set to DEBUG:

com.atlassian.confluence.search.lucene
com.atlassian.bonnie

Enabling debugging for indexing permanently

  1. Edit <CONFLUENCE_INSTALL>/confluence/WEB-INF/classes/log4j.properties file and add

    log4j.logger.com.atlassian.confluence.search.lucene=DEBUG
    log4j.logger.com.atlassian.bonnie=DEBUG
  2. Restart Confluence

Logging should appear in the <CONFLUENCE_HOME>/logs/atlassian-confluence.log file, like the following:

2009-10-01 08:50:07,633 DEBUG [http-8080-6] [search.lucene.queue.DatabaseIndexTaskQueue] enqueue Enqueuing task: IndexQueueEntry{id=0, handle='com.atlassian.confluence.pages.Page-91258884', type=Unknown, creationDate=Thu Oct 01 08:50:07 EDT 2009}
2009-10-01 08:50:07,635 DEBUG [http-8080-6] [search.lucene.queue.DatabaseIndexTaskQueue] enqueue Enqueuing task: IndexQueueEntry{id=0, handle='com.atlassian.confluence.pages.Page-91258884', type=Unknown, creationDate=Thu Oct 01 08:50:07 EDT 2009}
2009-10-01 08:50:08,023 DEBUG [DefaultQuartzScheduler_Worker-7] [search.lucene.queue.DatabaseIndexTaskQueue] getUnflushedEntries Fetching index entries added since: Thu Oct 01 08:50:06 EDT 2009
2009-10-01 08:50:08,027 DEBUG [DefaultQuartzScheduler_Worker-7] [search.lucene.queue.DatabaseIndexTaskQueue] getUnflushedEntries Fetched 2 entries from datbase.
2009-10-01 08:50:08,027 DEBUG [DefaultQuartzScheduler_Worker-7] [search.lucene.queue.DatabaseIndexTaskQueue] getUnflushedEntries Having excluded entries that have previously been flushed, 2 entries remain.

Confluence 5.10 and above

Enabling debugging for indexing temporarily

From Administration >> Logging and Profiling, add the following package, and set to DEBUG:

com.atlassian.confluence.internal.index.AbstractBatchIndexer
com.atlassian.confluence.search.lucene
com.atlassian.bonnie.search.extractor

For very verbose logging set the full package to DEBUG, instead of just AbstractBatchIndexer (this can cause space issues on instances with lots of data):

com.atlassian.confluence.internal.index

Enabling debugging for indexing permanently

  1. Edit <CONFLUENCE_INSTALL>/confluence/WEB-INF/classes/log4j.properties file and add

    1. For simple progress logging:

      log4j.logger.com.atlassian.confluence.internal.index.AbstractBatchIndexer=DEBUG
      log4j.logger.com.atlassian.confluence.search.lucene=DEBUG
    2. For very verbose logging add the full package instead of just AbstractBatchIndexer (this can cause space issues on instances with lots of data):

      log4j.logger.com.atlassian.confluence.internal.index=DEBUG
  2. Restart Confluence

Logging should appear in the <CONFLUENCE_HOME>/logs/atlassian-confluence.log file, like the following:

2017-04-06 13:23:56,853 DEBUG [Caesium-1-1] [confluence.search.lucene.LuceneIndexManager] flushQueue Flush requested
2017-04-06 13:23:56,923 DEBUG [Caesium-1-1] [confluence.search.lucene.PluggableSearcherInitialisation] initialise Warming up searcher..
2017-04-06 13:23:56,923 DEBUG [Caesium-1-1] [confluence.search.lucene.DefaultSearcherInitialisation] initialise Warming up searcher..
2017-04-06 13:23:56,927 DEBUG [Caesium-1-1] [confluence.search.lucene.LuceneIndexManager] flushQueue Flushed 4 items in 72 milliseconds



Knowledge Base Articles

Click here to expand...

Open JIRA Features and Bug Reports

Click here to expand...
type key summary priority status

Data cannot be retrieved due to an unexpected error.

View these issues in Jira

About Confluence and Lucene

Confluence maintains a Lucene search index of all the text in an installation. Along with incremental indexing and twice daily optimization of the index, Confluence supports user-controlled full reindexing of the search indexes.

Reindexing is driven by the DefaultConfluenceIndexManager, which runs an IndexingTask on its own thread named "confluence-interactive-reindexing-thread". This task drives the MultiThreadedIndexRebuilder to get it to launch up to 10 DefaultObjectQueueWorker instances each running on their own threads named "Indexer: <n>". These worker threads compete for work from an ObjectQueue which is also populated by MultiThreadedIndexRebuilder with all the objects in the DB that are searchable. The ObjectQueue loads objects lazily: it is given HibernateHandles, but when it pops objects from the queue, those objects are fully loaded. Since the ObjectQueue instance is shared, all modifications to its contents are synchronized.

How is a Reindex Triggered?

Short answer: ViewIndexQueueAction, RestoreAction, RestoreLocalFileAction are the only parts of the UI I can find that trigger a reindex. It looks like reindexing is NOT supported as part of the SOAP API, nor is there any way to trigger it during an import from SnipSnap. There is also no way to control it via the current REST API.

Details

There are only a few classes that managed an IndexManager: ViewIndexQueueAction and ImportLongRunningTask. ViewIndexQueueAction is the main action for viewing the search index details (via Confluence Admin > Administration > Content Indexing), and it also handles clicks on the Rebuild button shown on this screen. ImportLongRunningTask is a task that (among other things) will reindex on all cluster nodes if the given ImportContext.isRebuildIndex is true.

AbstractImportAction will set up this sort of context if its buildIndex property is true. ImportSnipSnapAction, RestoreAction, RestoreLocalFileAction and RestorePageAction are the interesting actions that extend AbstractImportAction. (There are other less interesting ones, such as AbstractFileRestoreAction, SetupRestoreAction and SetupRestoreFileAction, but these are irrelevant for day-to-day operation of Confluence.)

The rest of this is from cross referencing those actions with setup inside the xwork.xml:

The restore-include.vm used to render the form that leads to one of RestoreAction and RestoreLocalFileAction executing has a checkbox for buildIndex that defaults to enabled. This means that unless the user is very alert, they will most likely trigger a full index rebuild any time they restore a file. These are /admin actions, so only an adminstrator can trigger them.

The restorepage.vm that leads to RestorePageAction actually doing a restore thankfully does not include a checkbox for buildIndex. If it did, restoring a page could lead to extremely poor performance of the whole site, especially since the user only needs create page (a.k.a. EDITSPACE) permissions.

The import-snipsnap.vm that leads to ImportSnipSnapAction actually doing an import strangely does not include a checkbox for buildIndex. This is surprising, since reindexing is probably the very next thing the user will have to do, but perhaps an understandable oversight for such a little-used feature.

What is INDEXQUEUEENTRIES table used for?

When a user adds a page, edits a page, makes a comment, changes a restriction or does anything that requires an item to be indexed, the entry will go into the INDEXQUEUEENTRIES table to be indexed later.
If you go to Administration > Content Indexing > Queue Contents, you can see the index entries that need to be indexed (but this queue is NOT a reflection of the entries in the INDEXQUEUEENTRIES. This table will contain much more entries than what you see in administration console).

Once a minute the index queue is flushed. This is controlled by the indexQueueFlushTrigger job in the schedulingSubsystemContext.xml

    <bean id="indexQueueFlushTrigger" class="org.springframework.scheduling.quartz.CronTriggerBean">
        <property name="jobDetail">
            <ref bean="indexQueueFlushJob"/>
        </property>
        <property name="cronExpression">
            <value>0 * * * * ?</value>
        </property>
    </bean>

When this job is run, it will look at the time of the last item was indexed and fetch all of the items that are newer than this last time from the INDEXQUEUEENTRIES and then index them.

Once it has indexed them, it will update the saved timestamp to be the time of the last indexed item.

Does a Reindex Look in This Table?

No. A full reindex will look at all the tables in the database, (CONTENT, ATTACHMENTS, etc) and will not look at the INDEXQUEUEENTRIES.
Thus if needed, this table can be purged and a full reindex done and you will have a complete, correct index.

What Entries are Purged from INDEXQUEUEENTRIES?

The job runs once a day (see schedulingSubsystemContext.xml):

    <bean id="databaseIndexQueueCleanTrigger" class="org.springframework.scheduling.quartz.CronTriggerBean">
        <property name="jobDetail">
            <ref bean="indexQueueCleanJob"/>
        </property>
        <property name="cronExpression">
            <value>0 0 2 * * ?</value>
        </property>
    </bean>

This job will clean out all old entries that are over 2 days old. That is right. It does not delete the indexed entries but the entries that are over 2 days old. So entries that are already indexed (that are less than 2 days old) remain in this table.

Why Do We Use This Table?

The main reason is for cluster installations. For some jobs on clusters, the job will only be run on one node at a time. For example the cluster safety job will only be run on one node, i.e. if it runs on node A, node B will not run it. The next time node B may run the job in which case node A will run the job.

There are exceptions and these include indexing. Each node on the cluster maintains its own index, and thus each node has to run the indexing job once a minute. If one job runs the indexing job, it cannot delete the entries because another node will need to index these same entries at a later stage.

Search Troubleshooting

Index Troubleshooting

Last modified on Feb 18, 2019

Was this helpful?

Yes
No
Provide feedback about this article

In this section

Powered by Confluence and Scroll Viewport.