Platform notice: Server and Data Center only. This article only applies to Atlassian products on the Server and Data Center platforms.

Support for Server* products ended on February 15th 2024. If you are running a Server product, you can visit the Atlassian Server end of support announcement to review your migration options.

*Except Fisheye and Crucible

Problem

For Office documents (*.xlsx, *.pptx, *.docx) that exceed certain thresholds, Confluence can't extract text and make it available in searches. Some of the attachment size limitations for indexing are outlined here :

Configuring Attachment Size

As per the knowledge base article linked above, the process of extracting attachment content for indexing is memory intensive and can cause out of memory errors when large files are uploaded. The size limit here is a safeguard built into Confluence to prevent this happening.

Diagnosis

Place the below classes in DEBUG

com.atlassian.confluence.internal.index.attachment
com.atlassian.confluence.internal.index
com.atlassian.confluence.search.lucene
com.atlassian.bonnie.search.extractor

After reproducing the issue, we see below entries where Confluence is complaining about Document being too big for text extraction which explains why the contents of these files are not searchable.

2021-03-06 18:25:56,021 DEBUG [attachment-text-extraction-worker-1] [internal.index.attachment.DefaultAttachmentExtractedTextManager] getContent Can't read extracted text of attachment 884741
2021-03-06 18:25:56,055 WARN [attachment-text-extraction-worker-1] [confluence.impl.hibernate.ConfluenceHibernateTransactionManager] doRollback Performing rollback. Transactions:
  ->[com.atlassian.confluence.internal.index.attachment.AttachmentTextExtractionFunction.apply]: PROPAGATION_REQUIRES_NEW,ISOLATION_DEFAULT (Session #674982162)
 -- referer: http://localhost:8090/pages/resumedraft.action?draftId=884737&draftShareId=ca5fced8-89dd-4fff-8660-4aa3d2903ce3& | url: /rest/documentConversion/latest/conversion/thumbnail/results | traceId: e9c077ee7d66b117 | userName: admin
2021-03-06 18:25:56,056 DEBUG [Caesium-1-1] [search.lucene.extractor.AttachmentExtractedTextExtractor] addFields Error when extracting text for 884741
java.util.concurrent.CompletionException: java.lang.RuntimeException: com.atlassian.bonnie.search.extractor.ExtractorException: Error reading content of PowerPoint document: Document too big for text extraction, bailing out
    at java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:273)
    at java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:280)
    at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1606)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.RuntimeException: com.atlassian.bonnie.search.extractor.ExtractorException: Error reading content of PowerPoint document: Document too big for text extraction, bailing out
    at com.atlassian.confluence.extra.officeconnector.index.AbstractAttachmentExtractor.extract(AbstractAttachmentExtractor.java:33)
    at com.atlassian.confluence.internal.index.attachment.DelegatingAttachmentTextExtractor.lambda$extract$1(DelegatingAttachmentTextExtractor.java:35)
    at java.util.Optional.flatMap(Optional.java:241)
    at com.atlassian.confluence.internal.index.attachment.DelegatingAttachmentTextExtractor.extract(DelegatingAttachmentTextExtractor.java:35)
    at com.atlassian.confluence.internal.index.attachment.AttachmentTextExtractionFunction.apply(AttachmentTextExtractionFunction.java:70)
    at com.atlassian.confluence.internal.index.attachment.AttachmentTextExtractionFunction.apply(AttachmentTextExtractionFunction.java:22)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:343)
    at org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:198)
    at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:163)
    at org.springframework.transaction.interceptor.TransactionAspectSupport.invokeWithinTransaction(TransactionAspectSupport.java:295)
    at org.springframework.transaction.interceptor.TransactionInterceptor.invoke(TransactionInterceptor.java:98)

We also have a known issue : CONFSERVER-58824 - Getting issue details... STATUS

Solution

You can override the thresholds by setting a Java sysprop and restarting your instance.

Note: We have used the value 6mb as an example, be sure to test this in your lower instance prior to rolling this onto Production. Also, please be mindful when increasing the size here, as shared earlier in this comment, text extraction is a powerful operation and resource intensive.

Extension

System prop key

Example

.docx

officeconnector.textextract.word.docxmaxsize

-Dofficeconnector.textextract.word.docxmaxsize=6052413

.xlsx

officeconnector.excel.extractor.maxlength

-Dofficeconnector.excel.extractor.maxlength=6052413

.pptx

officeconnector.powerpoint.extractor.maxlength

-Dofficeconnector.powerpoint.extractor.maxlength=6052413

Description	Unable to search content in PPT files when ppt file uses design templates or contains images
Product	Confluence

Confluence Support

Get started

Knowledge base

Products

Jira Software

Jira Service Management

Jira Work Management

Confluence

Bitbucket

Resources

Documentation

Community

System Status

Suggestions and bugs

Marketplace

Billing and licensing

Content of larger Office files not searchable

Still need help?

Problem

Diagnosis

Solution

Page

Viewport

Confluence

Content of larger Office files not searchable

Related content

Still need help?

Problem

Diagnosis

Solution

Related content