How to manually rebuild content index from scratch on Confluence Data Center without any downtime
Platform Notice: Data Center - This article applies to Atlassian products on the Data Center platform.
Note that this knowledge base article was created for the Data Center version of the product. Data Center knowledge base articles for non-Data Center-specific features may also work for Server versions of the product, however they have not been tested. Support for Server* products ended on February 15th 2024. If you are running a Server product, you can visit the Atlassian Server end of support announcement to review your migration options.
*Except Fisheye and Crucible
Summary
This KB article highlights the steps to rebuild the content index in a Confluence Data Center without any downtime, when using Lucene search platform, which is the default search platform.
If you want to check other options to rebuild the content index, go to How to Rebuild the Content Indexes From Scratch.
If you're running Confluence with OpenSearch, you don't need to follow these steps: you can rebuild your content indexes normally without causing any downtime. This is because Confluence will perform indexing with a blue-green approach (that is, it will use a separate new index).
If you have any question about this procedure or if you encounter any problem while running the reindex, create a ticket to Atlassian Support.
Environment
This procedure is valid if you are running Confluence Data Center versions 6.x later, using Lucene as the search platform.
Solution: Confluence 7.7 and later
From Confluence 7.7 rebuilding the index and propagating the new index to all nodes in your cluster can be done directly through the Confluence UI.
This method requires no downtime, and you can also choose to remove the node performing the reindex from your load balancer, to further minimise any performance impact on users. Confluence will continue to use the existing index until the new index has been successfully rebuilt and propagated to each node in the cluster.
See Content Index Administration for more information.
Solution: Confluence 6.0 to 7.6
The Sections below describes the procedure to rebuild the content index in Confluence Data Center without a downtime.
While you run the reindex in a specific node, other nodes from the Confluence cluster will be up and serving requests from users.
In this example we will consider a Data Center deployment with 2 nodes; node1
and node2
.
The same procedure would be applicable to a deployment with 3, 4 or more nodes. In this case, run the same steps from node2
in any other node.
If you are running Confluence Data Center with a single node you won’t be able to rebuild the content index from scratch without a downtime. Therefore, refer to How to manually rebuild content index from scratch on Confluence Data Center with downtime.
Throughout the document we may refer to the Confluence content index rebuild process simply as reindex.
Preparation phase
Choose the node where the reindex process will run.
There should be only one node running the reindex process.
- This node should not serve user requests, while all other nodes from the cluster will be running and available to the users.
In this example we will choose
node1
as the node to run the reindex – we may refer to it as the reindex node.
On
node1
configurelog4j
to send indexing-related log messages to a separate file.The additional information in a separated log file is helpful to identify if the indexing process is still running or if it completed; it is also helpful in case anything goes wrong and you need to contact the support team.
Follow the instructions on Configuring log4j in Confluence to send specific entries to a different log file to send messages from target classes to
atlassian-confluence-indexing.log
.The following classes should be added to INFO as part of the above procedure.
com.atlassian.confluence.search.lucene com.atlassian.confluence.internal.index com.atlassian.confluence.impl.index com.atlassian.bonnie.search.extractor
This change will only take effect when Confluence is restarted, which you don’t need to perform at this point, since it will be restarted on a following step.
On
node1
, configure the below JVM system property so that it doesn’t try to request the index files from a running node.-Dconfluence.cluster.index.recovery.num.attempts=0
- This change will only take effect when the Confluence node is restarted, which you don’t need to perform at this point, since it will be restarted on a following step.
- This change will only take effect when the Confluence node is restarted, which you don’t need to perform at this point, since it will be restarted on a following step.
- Add any other temporary configuration you would like to
node1
.- See How to Rebuild the Content Indexes From Scratch for best practices and additional parameters.
- See How to Rebuild the Content Indexes From Scratch for best practices and additional parameters.
Remove
node1
from the load balancer target group.- Confluence may be slow while reindex is running, so users might not get the best experience from it if the node running the reindex is available to them.
- Normal requests from user may compete for resources with the reindex process, so it is best to send these requests to other nodes.
Cleaning the current index
- Stop Confluence only on
node1
following your standard procedure. Access the database and run the following SQL to delete old entries from the
journalentry
table.DELETE FROM journalentry WHERE creationdate < CURRENT_TIMESTAMP - interval '2 hour';
- This will delete entries that are older than 2 hours. Nodes that are still online will continue to run the incremental reindex when new content is created.
Take a safety backup of
index/
On
node1
delete all files from the following folders.<Confluence-home>/index/ <Confluence-home>/journal/ <Confluence-shared-home>/index-snapshots/
Rebuilding the index
- Start Confluence on
node1
.- This will automatically trigger the reindex on this node during the startup process.
- Track the reindex process through the
atlassian-confluence-indexing.log
file.The following entry indicates the reindex process started.
2020-06-09 17:32:05,553 WARN [Catalina-utility-1] [confluence.impl.index.DefaultIndexRecoveryService] triggerIndexRecovererModuleDescriptors Index recovery is required for main index, starting now 2020-06-09 17:32:05,557 DEBUG [Catalina-utility-1] [confluence.impl.index.DefaultIndexRecoveryService] recoverIndex Cannot recover index because this is the only node in the cluster 2020-06-09 17:32:05,558 WARN [Catalina-utility-1] [confluence.impl.index.DefaultIndexRecoveryService] triggerIndexRecovererModuleDescriptors Could not recover main index, the system will attempt to do a full re-index 2020-06-09 17:32:05,724 DEBUG [lucene-interactive-reindexing-thread] [confluence.internal.index.AbstractReIndexer] reIndex Index for ReIndexOption CONTENT_ONLY
While reindex is running you may see entries similar to the below.
2020-06-09 17:32:14,911 DEBUG [Indexer: 4] [internal.index.lucene.LuceneContentExtractor] lambda$extract$0 Adding fields to document for Space{key='SPC550'} using BackwardsCompatibleExtractor wrapping com.atlassian.confluence.search.lucene.extractor.AttachmentOwnerContentTypeExtractor@7103c5b0 (confluence.extractors.core:attachmentOwnerContentTypeExtractor) 2020-06-09 17:32:14,911 DEBUG [Indexer: 8] [internal.index.lucene.LuceneBatchIndexer] doIndex Index Space{key='SPC632'} [com.atlassian.confluence.spaces.Space] 2020-06-09 17:32:14,910 DEBUG [Indexer: 6] [internal.index.lucene.LuceneBatchIndexer] doIndex Index Space{key='SPC530'} [com.atlassian.confluence.spaces.Space] 2020-06-09 17:32:14,909 DEBUG [Indexer: 7] [internal.index.attachment.DefaultAttachmentExtractedTextManager] isAdapted Adapt attachment content extractor com.atlassian.confluence.extra.officeconnector.index.excel.ExcelXMLTextExtractor@6aa2ced for reuse extracted text
The following entry indicates the reindex process completed successfully.
2020-06-09 17:32:31,387 DEBUG [Indexer: 1] [internal.index.lucene.LuceneContentExtractor] lambda$extract$0 Adding fields to document for userinfo: user001 v.1 (1572865) using BackwardsCompatibleExtractor wrapping com.atlassian.confluence.search.lucene.extractor.HtmlEntityFilterExtractor@4033e565 (confluence.extractors.core:htmlEntitiesFilterExtractor) 2020-06-09 17:32:31,387 DEBUG [Indexer: 1] [internal.index.lucene.LuceneContentExtractor] lambda$extract$0 Adding fields to document for userinfo: user001 v.1 (1572865) using BackwardsCompatibleExtractor wrapping com.atlassian.confluence.search.lucene.extractor.TitleExtractor@48deb96f (confluence.extractors.core:titleExtractor) 2020-06-09 17:32:31,388 DEBUG [Indexer: 1] [confluence.internal.index.AbstractBatchIndexer] lambda$accept$0 Re-index progress: 100% complete. 3276 items have been reindexed 2020-06-09 17:32:31,432 DEBUG [Indexer: 1] [confluence.internal.index.AbstractBatchIndexer] accept BatchIndexer batch complete 2020-06-09 17:32:31,491 DEBUG [lucene-interactive-reindexing-thread] [confluence.search.lucene.PluggableSearcherInitialisation] initialise Warming up searcher.. 2020-06-09 17:32:31,491 DEBUG [lucene-interactive-reindexing-thread] [confluence.search.lucene.DefaultSearcherInitialisation] initialise Warming up searcher..
- Do not proceed to the next step while you don’t get a confirmation from the logs that the reindex had completed.
- Create a sample page to confirm new items are being added to the content index.
- Create a sample page with unique content and make sure it is searchable.
- You may need to wait up to 30 seconds so the new page is indexed.
- This step is important to ensure the new index is healthy and to prevent a bug reported in
CONFSERVER-57681
-
Getting issue details...
STATUS
.
- Access the Confluence UI and go to Cog icon > General configuration > Content Indexing > Queue Contents and confirm there’s no item to be processed.
- If there are items in the index queue, give it a couple of minutes to complete.
- If there are items in the index queue, give it a couple of minutes to complete.
Additional steps if Questions for Confluence is installed
Follow the steps in this Section only if you have Questions for Confluence (QfC) installed.
If you don't, then proceed to the next Section.
Copy the index file to a shared location
- Stop Confluence on
node1
. On
node1
compress theindex
and thejournal
folders and save them in theshared-home
.
Saving these files in the shared home make them available to the other nodes in the cluster.cd <Confluence Home Dir> tar -cvf <Confluence-shared-home>/node1-index.tar ./index tar -cvf <Confluence-shared-home>/node1-journal.tar ./journal
- On
node1
remove the additional logging configuration made inlog4j.properties
.- These were added as part of the Preparation Phase.
- Since unnecessary debugging can negatively impact the application, it is strongly recommended you don’t keep that configuration during normal operation of Confluence.
- Remove any additional JVM property added as part of the preparation phase.
- Start Confluence on
node1
and confirm it is working fine. - At this point
node1
can be made available to users. Therefore, addnode1
back to the load balancer target group.
Copy the index files from the shared home to the remaining nodes of the cluster
- Stop Confluence on
node2
following your standard procedure. On
node2
delete theindex
andjournal
folders in the local home.cd <Confluence-home> rm -rf index/ rm -rf journal/
On
node2
uncompress theindex
andjournal
folders from the shared home to the local home.cd <Confluence-home> tar -xvf <Confluence-shared-home>/node1-index.tar tar -xvf <Confluence-shared-home>/node1-journal.tar
Confirm the index files were properly placed in
node2
local home.
The structure should be similar to the below.$ ls -R atlassian-confluence-local-home/index atlassian-confluence-local-home/journal atlassian-confluence-local-home/index: _9.cfe _a.cfe _b.cfe _c.cfe _d.cfe _e.cfe _f.cfe _g.cfe _h.cfe _j.cfe _j_1.del segments_9 _9.cfs _a.cfs _b.cfs _c.cfs _d.cfs _e.cfs _f.cfs _g.cfs _h.cfs _j.cfs edge _9.si _a.si _b.si _c.si _d.si _e.si _f.si _g.si _h.si _j.si segments.gen atlassian-confluence-local-home/index/edge: _0.cfe _0.cfs _0.si _1.cfe _1.cfs _1.si segments.gen segments_4 atlassian-confluence-local-home/journal: edge_index main_index
- Start Confluence on
node2
and confirm it is working fine.
Cleanup tasks
Delete the compressed
index
andjournal
files from the shared home folder.rm -f <Confluence-shared-home>/node1-index.tar rm -f <Confluence-shared-home>/node1-journal.tar
Restore Popular Content
(Optional): If desired, restore the following directories from your backup from Step 2, see Popular content missing after reindexing from scratch for instructions
Confirming content index is working on all nodes
These steps help to confirm new content is being properly indexed by all Confluence nodes.
- Access the Confluence UI on
node1
and create a new sample page. - Go to Cog icon > General configuration > Content Indexing > Queue Contents and confirm all objects were processed in index queue.
- If it is still processing, wait for a minute or so for it to complete.
- Do a search on
node1
for the newly created document to confirm it was indexed. - Go to
node2
and access Cog icon > General configuration > Content Indexing > Queue Contents to confirm there’s no object on the indexing queue. - Do a search on
node2
for the newly created document to confirm it was indexed. - Do the same validation on any remaining node of the Confluence cluster.
See also
How to Rebuild the Content Indexes From Scratch