Searching and Indexing Troubleshooting

Still need help?

The Atlassian Community is here for you.

Ask the community

Platform notice: Server and Data Center only. This article only applies to Atlassian products on the Server and Data Center platforms.

Support for Server* products ended on February 15th 2024. If you are running a Server product, you can visit the Atlassian Server end of support announcement to review your migration options.

*Except Fisheye and Crucible

Confluence Data Center and server indexing is used by search, dashboard, some macros, user mentions and other places in confluence where information about your content or reference to content are required. Confluence Data Center and server index is running on Apache Lucene engine and is made up of:

  • content index which contains content such as the text of pages, blog posts, and comments 
  • change index which contains data about each change, such as when a page was last edited. 


Index Troubleshooting Flowchart

The following flowchart should be used when dealing with unknown Indexing issues as an aid in determining the cause of a problem.

INDEXING TROUBLESHOOTING WORKFLOW


Terminology

  • "Index Rebuild" or "re-indexing" refers to action of rebuilding index from General Configuration > Content Indexing using existing index files
  • "Index Rebuild from scratch" refers to action of deleting existing index files from file system and building new index using new files
  • "Index Building" or "indexing" refers to day to day operations of confluence of building and maintaining index of existing and new content



User Mentions, profiles and Indexing

How Users are indexed

Confluence maintains search index powered by Lucene engine. Index is stored locally inside <confluence_home>/index directory on each Confluence Node and kept up to date by Confluence journaling jobs running on each DC Node and server instance.

When users are synchronized from an external directory or created locally, their profile information is added to indexing queue, specifically their First and Last Name. Email address or username are not indexed and pulled directly from DB on request by application code.

Common issues and bugs

There are a few known issues that result in users not getting indexed after being added, most can be addressed by one of the actions outlined in Unable to @ mention certain users in Confluence KB:

  • Go to <your-confluence-url>/confluence/display/~<missing-user> to view users profile

  • Go to <your-confluence-url>/admin/force-upgrade.action URL and run 'UserIndexingUpgradeTask' task

  • Go to <your-confluence-url>/confluence/admin/users/edituser.action?username=<missing-user> and update their about me or any other info

  • Go to General Configuration > Content Indexing and rebuild index from there as very new users might simply haven't been indexed yet
  • Attempt to rebuild index from scratch if possible, most issues can be solved this way

Browsers local storage or cache also plays role in user mentions, as it stores people you mention frequently for faster results, so testing on different browsers is important.

Useful KB articles

Bugs

Some of the current bugs that may affect user mention ability

Key Summary P Status Resolution Fix Version/s
Loading...
Refresh

Further Troubleshooting

Troubleshooting

When troubleshooting user mention and indexing problems it's always a good idea to gather user info from DB to compare searchable users and non searchable users

select cu.*
from CWD_USER cu
where cu.EMAIL_ADDRESS in ('non-working-user-email', 'working-user-email')
;

select cua.*
from CWD_USER cu
join CWD_USER_ATTRIBUTE cua on cua.USER_ID=cu.id
where cu.EMAIL_ADDRESS in ('non-working-user-email', 'working-user-email')
;

select um.*
from CWD_USER cu
join USER_MAPPING um on um.LOWER_USERNAME=cu.LOWER_USER_NAME
where cu.EMAIL_ADDRESS in ('non-working-user-email', 'working-user-email')
;

It's also worth checking whether USERINFO record exists for the user that needs to be @ mentioned

SELECT user_mapping.username,
       content.contenttype
FROM content
INNER JOIN user_mapping ON content.username = user_mapping.user_key
WHERE contenttype = 'USERINFO'
  AND lower_username = 'username-here'
;

Bug CONFSERVER-58937 was identified to affect LDAP users where upon login with a new user account before LDAP directory synch would result in profile creation with profile edit and create dates being the same and such users not getting indexed

Bug CONFSERVER-38569 was identified to affect LDAP and Crowd users where user directory recreation and a presence of a prior USERINFO record resulted in user no longer getting indexed

Debug logging:

From Administration >> Logging and Profiling, add the following packages, and set to DEBUG:

com.atlassian.confluence.internal.index.AbstractBatchIndexer
com.atlassian.confluence.search.lucene
com.atlassian.bonnie.search.extractor

Refer to How to turn on debugging for indexing KB for details

Starting with Confluence 7.11 indexing related log entries and debug logging will go into their dedicated atlassian-confluence-index.log file.
Prior to Confluence 7.11, log entries will be logged in atlassian-confluence.log

Indexing Jobs and DB

Indexing queue is where all the new and updated content are added to be indexed and can be seen from Administration > Content Indexing > Queue Contents.

It is likely that the Queue Content list will appear empty as Flush Edge Index Queue job is running every 30 second to process those requests, so by the time you check, entries might have already have been processed.

All records to be indexed are also stored inside JOURNALENTRY DB table and each Confluence node or server instance will have a record of last processed entry kept inside <confluence-home>/journal/main_index file. Checking last records inside JOURNALENTRY DB table and comparing them to <confluence-home>/journal/main_index file is a good way to determine which content index job has already processed.

Besides Flush Edge Index Queue job there is also Clean Journal Entries job running once a day, that will cleanup JOURNALENTRY DB table out of entries older than 2 days, so if index job was not properly running for a few days on a node or server instance, it's possible that last indexed entry recorded in main_index file is no longer stored on JOURNALENTRY DB table, if that is the case, index needs to be rebuild.


Content indexing, search and macros

How content gets indexed

Confluence indexes all the content that is added to it, this includes text within macros as well as attachments (if it can be extracted).

When indexing text, selected indexing language plays a vital role. Selected Indexing Language should be representative of content stored on Confluence, and is needed for accurate search for a couple of reasons:

  • Words stored in index get "stemmed" (linguistically form part of the word that is being searched i.e. wait is a stem of waited and waits; meet is a stem of meetingmet, and meets) based on selected language.
  • Certain words will not be indexed, referred to as "stop words". If your indexing language is set to English then a“, “the“, and “it“ in text won't get indexed, as such words would be considered noise and you won't be getting effective search results by searching for "the".

Why not use DB index instead?

Both Confluence and Jira have a lot of text and DB search historically has not been the best when searching text, and although different DB vendors have made improvements it still remains DB version specific. Lucene offers some other additional features mentioned above, and allows for the experience to remain consistent regardless of DB vendor.

Common issues and bugs

Most of the indexing or search issues can be solved by simply rebuilding index. Be is standalone Server instance or Data Center, all rebuilding processes are outlined in below guides, you can also monitor rebuilding progress from REST API if you do not have WebUI access for some reason.

Why do I need to rebuild from scratch?

When index is being rebuild from Confluence Administration menu, the process will remove content information from Lucene DB, then fetch it from Confluence and create new records / documents inside Lucene DB. However, if Lucene DB is corrupt - rebuild from Confluence Administration will fail to run properly. Rebuilding from scratch process is designed to remove Indexing DB files, forcing Lucene to create a new DB and populate it with records fetched from Confluence. There is no safe way to identify or rectify Lucene DB corruption, rebuilding it is the only way.

Most errors during index rebuild from scratch can be ignored as process will continue skipping any issues it encounters, however, should it happen that errors do not go away or need to be investigated, feel free to check out some of these KBs:

Useful KB articles

Attachment indexing

Attachment extraction and indexing will occur for certain text file types and Confluence will only attempted it for files smaller than 100 MB. For more information as well as changing default extracted file size refer to Configuring Attachment Size guide.

You can also disable this feature altogether.

Bugs

Some of the current bugs that may affect general indexing and search

Key Summary P Status Resolution Fix Version/s
Loading...
Refresh


Further Troubleshooting

Rebuilding, Logging and Internals

Index and Confluence search are dependent on content permissions, when you search Confluence for specific keyword, entire confluence index is searched and restrictions are applied on the results based on permissions on content, so problems with ancestor tables will have impact on index and index rebuild, so do check for them when facing rebuild problems

Although most errors and warnings related to indexing normally get logged, sometimes enabling debug logging and looking at what was logged during search or index building or rebuilding is very helpful to investigation.

Unable to render excerpt. Already included page Confluence Indexing Troubleshooting Guide.

Index Rebuilding task launches up to 6 threads named "Indexer: <n>". Any errors during this task will be logged by one of those threads.

For example, from Indexer stops with ERROR java.io.IOException: No space left on device KB:

ERROR [Indexer: 1] [internal.index.lucene.LuceneBatchIndexer] doIndex Exception indexing document with id: 12345678
│java.io.IOException: No space left on device                                

Building of index is carried out by Flush Edge Index Queue job which will be handled by Confluence job scheduler, so any related errors or warning occurring during those jobs will typically be logged under "scheduler_Worker-<n>" thread and you would need to look into into message itself to determine the nature of Warning/Error as seen in example from the same KB:

ERROR [scheduler_Worker-10] [org.quartz.core.ErrorLogger] schedulerError Job (DEFAULT.IndexQueueFlusher threw an exception.
org.quartz.SchedulerException: Job threw an unhandled exception. [See nested exception: com.atlassian.bonnie.LuceneException: com.atlassian.confluence.api.service.exceptions.ServiceException: Failed to process entries]
WARN [scheduler_Worker-10] [search.lucene.tasks.AddDocumentIndexTask] perform Error getting document from searchable
java.io.IOException: No space left on device
WARN [scheduler_Worker-1] [search.lucene.tasks.AddDocumentIndexTask] perform Error getting document from searchable
java.io.FileNotFoundException: /opt/confluence/confluence-home/index/_z8ey.fdx (No space left on device)               

Viewing Index

In some rare situations you might need to check out what is being stored in Lucene index itself, for example, as part of debugging to verify that certain piece of content actually got indexed. You can refer to this guide on ways to do it.
(warning) Index issues or corruption can't be fixed from within and rebuilding corrupt index is best and safest option.


Performance and Indexing

Performance problems and resource management

Typical Confluence configuration that we are shipping is optimized for index to operate without impacting Confluence performance.

You can change default configuration allowing indexing to take up more resources but such changes can cause Confluence to suffer from performance problem or encounter Out of Memory exceptions.

Confluence instance can also face OOM events during normal indexing operations if not enough resources are allocated to Confluence, in such scenarios resource allocation is the culprit and not the indexing operations.

If Confluence has enough resources allocated to it (memory/CPU/disk), shifting more of them to indexing task can reduce index rebuild times, but before proceeding you need to consider if rebuild times are actually unreasonable. Building Index for a lot of pages and attachments is expected to take lot of time, regardless of resources allocated to indexing tasks.

Common issues and bugs

Confluence Performance Problems

If Confluence is facing performance problems during index rebuild operations, below are some of the things you should check:

Attachments

Confluence will attempt to extract text from attachments and build index for searchability, depending on what you want to accomplish there are several KBs that cover this feature and potential issues with it.

Optimizing Index with System properties

If you wish for rebuilding index task to complete faster or if day to day index building is not keeping up with newly generate content rate, below parameters might help.
(info) Try to focus on one parameter at a time to keep track of changes and control system load.

  • Set (6) and increase -Dreindex.thread.count system property in steady increments (2) and monitor your Disk, CPU and Memory loads during re-indexing operations
    Above parameter will determine how many threads Lucene can open for index rebuild task, useful for trying to improve index rebuild completion time.
  • Set (6) and increase -Dreindex.attachments.thread.count system property in steady increments (2) and monitor your Disk, CPU and Memory loads during re-indexing operations
    Above parameter will determine how many threads Lucene can open for attachments indexing rebuild, useful for trying to improve index rebuild completion time in systems with a lot of attachments.

  • Set (1500) and increase -Dindex.queue.batch.size system property in steady increments (500) and monitor your Disk, CPU and Memory loads during normal operations
    Above parameter will determine how many items will be processed by Flush Edge Index Queue job each time, useful if your system is generating a lot of content, and Lucene can't keep up resulting in Queue building up steadily. Can be enabled temporary during migrations or anticipated mass content generation activities.

Bugs

Some of the current bugs that related to indexing and performance

Key Summary P Status Resolution Fix Version/s
Loading...
Refresh



Last modified on Feb 9, 2022

Was this helpful?

Yes
No
Provide feedback about this article
Powered by Confluence and Scroll Viewport.