This documentation relates to an early version of Confluence.
View this page in the current documentation or visit the current documentation home.

How do I disable indexing of attachments

All Versions
Click for all versions
Confluence 2.0 to 2.5 Documentation

Index

Sometimes a user can experience problems indexing large MSExcel or MSPowerPoint documents (there should be a JIRA issue link here?) and the reindexing may cause potential Unknown Ptg warnings messages that are harmless. There is already a request to Suppress these warnings from re-index of unreadable documents by POI library.

The error is usually not serious yet can sometimes cause problems when large attachments are used. Hence, one may like to disable indexing of a particular type of documents.

To do this, one needs to modify the attachment-extractors.xml and comment out an file type extractor .

The example below shows a pdfContentExtractor disabled which would cause PDF attachments not to be indexed.

Once the ContentExtractor is disabled, that files become unsearchable.

<atlassian-plugin name='Attachment Extractors' key='confluence.extractors.attachments'>
    <plugin-info>
        <description>This library extracts searchable text from various attachment types.</description>
        <vendor name="Atlassian Software Systems" url="http://www.atlassian.com"/>
        <version>1.4</version>
    </plugin-info>

    <!--
    <extractor name="PDF Content Extractor" key="pdfContentExtractor" class="com.atlassian.bonnie.search.extractor.PdfContentExtractor" priority="1100">
        <description>Indexes contents of PDF files</description>
    </extractor>
    -->
    <extractor name="MS Word Content Extractor" key="msWordContentExtractor" class="com.atlassian.bonnie.search.extractor.MsWordContentExtractor" priority="1100">
        <description>Indexes contents of Microsoft Word files</description>
    </extractor>

    <extractor name="MS Excel Content Extractor" key="msExcelContentExtractor" class="com.atlassian.bonnie.search.extractor.MsExcelContentExtractor" priority="1100">
        <description>Indexes contents of Microsoft Excel files</description>
    </extractor>

    <extractor name="MS PowerPoint Content Extractor" key="msPowerpointContentExtractor" class="com.atlassian.bonnie.search.extractor.MsPowerpointContentExtractor" priority="1100">
        <description>Indexes contents of Microsoft PowerPoint files</description>
    </extractor>
</atlassian-plugin>

Labels

powerpoint powerpoint Delete
word word Delete
excel excel Delete
pdf pdf Delete
disable disable Delete
attachment attachment Delete
indexing indexing Delete
howdoi-faq howdoi-faq Delete
Enter labels to add to this page:
Please wait 
Looking for a label? Just start typing.