This documentation relates to the latest version of Confluence.
If you are using an earlier version, please go to the documentation home page and select the relevant previous version.

How do I disable indexing of attachments

Confluence 2.8 Documentation

Index

Sometimes a user can experience problems indexing large MSExcel or MSPowerPoint documents and the reindexing may cause potential Unknown Ptg warning messages that are harmless. There is already a request to Suppress these warnings from the re-indexing of unreadable documents by the POI library.

The error is usually not serious yet can sometimes cause problems when large attachments are used. So you may like to disable indexing of a particular type of document.

To do this, you need to modify the confluence\WEB-INF\classes\plugins\attachment-extractors.xml and comment out the relevant file type extractor. From Confluence 2.6, attachment-extractors.xml is packaged inside confluence-2.6.0.jar; we have instructions for Editing files within .jar archives if you're unfamiliar with the process.

The example below shows a pdfContentExtractor disabled which would cause PDF attachments not to be indexed.

Once the ContentExtractor for a file type is disabled, all files of that type become unsearchable.

<atlassian-plugin name='Attachment Extractors' key='confluence.extractors.attachments'>
    <plugin-info>
        <description>This library extracts searchable text from various attachment types.</description>
        <vendor name="Atlassian Software Systems" url="http://www.atlassian.com"/>
        <version>1.4</version>
    </plugin-info>

    <!--
    <extractor name="PDF Content Extractor" key="pdfContentExtractor" class="com.atlassian.bonnie.search.extractor.PdfContentExtractor" priority="1100">
        <description>Indexes contents of PDF files</description>
    </extractor>
    -->
    <extractor name="MS Word Content Extractor" key="msWordContentExtractor" class="com.atlassian.bonnie.search.extractor.MsWordContentExtractor" priority="1100">
        <description>Indexes contents of Microsoft Word files</description>
    </extractor>

    <extractor name="MS Excel Content Extractor" key="msExcelContentExtractor" class="com.atlassian.bonnie.search.extractor.MsExcelContentExtractor" priority="1100">
        <description>Indexes contents of Microsoft Excel files</description>
    </extractor>

    <extractor name="MS PowerPoint Content Extractor" key="msPowerpointContentExtractor" class="com.atlassian.bonnie.search.extractor.MsPowerpointContentExtractor" priority="1100">
        <description>Indexes contents of Microsoft PowerPoint files</description>
    </extractor>
</atlassian-plugin>

Labels

 
  1. Feb 19

    Anonymous says:

    I'm sorry, but I really object to having to edit within the jar files provided b...

    I'm sorry, but I really object to having to edit within the jar files provided by Atlassian.  It's a very unfriendly requirement for admin'ing the system.

    I've seen some details in other similar how-tos on how an XML file can be provided in the file system that will override what is packed in the jar files.   Please provde details for each of these reports on where a normal XML file can be placed in the file system to override these files in the jars.. 

    1. Feb 20

      Choy Li Tham says:

      Hi, Another workaround that I can think of is to perform the steps below and se...

      Hi,

      Another workaround that I can think of is to perform the steps below and see if it helps:

      1. Stop Confluence.
      2. Make a backup of the confluence-2.x.x.jar file.
      3. Extract the copied confluence-2.x.x.jar file.
      4. Locate the *.xml file from the copied confluence-2.x.x.jar file.
      5. Edit the file.
      6. Manually create the directory path accordingly. An example would be:
        <confluence_install>/confluence/WEB-INF/classes/com/atlassian/confluence/core
        
      7. Locate the edited file onto the directory above.
      8. Re-start Confluence.

      Hope the information does help.

      Regards,
      Choy Li

      1. Mar 05

        Anonymous says:

        Atlassian, I strongly agree with the objection to having to unjar files just to...

        Atlassian,

        I strongly agree with the objection to having to un-jar files just to edit them. Please put any files that have even the remote chance of being edited outside of the jar.

        Thanks,

        A concerned developer.

      2. Mar 05

        Anonymous says:

        Choy, Is it possible to unjar the entire confluence2.x.x.jar file, and keep it ...

        Choy,

        Is it possible to un-jar the entire confluence-2.x.x.jar file, and keep it un-jar'ed in the <confluence_install> dir.

        If so, what are the exact directions (including any full file paths and any config changes) to do this.

        Thanks in advance.

        1. Mar 06

          Choy Li Tham says:

          Hi, To my understanding, if you prefer not to rejar the {{confluence2.x.x.jar}}...

          Hi,

          To my understanding, if you prefer not to re-jar the confluence-2.x.x.jar file in the WEB-INF/lib/ directory, you will need to manually create a directory path in <confluence_install> accordingly and then place the edited file onto the directory created. For example:

          <confluence_install>/confluence/WEB-INF/classes/com/atlassian/confluence/core
          

          Regards,
          Choy Li

Add Comment