|
Sometimes a user can experience problems indexing large MSExcel or MSPowerPoint documents and the reindexing may cause potential Unknown Ptg warning messages that are harmless. There is already a request to Suppress these warnings from the re-indexing of unreadable documents by the POI library. The error is usually not serious yet can sometimes cause problems when large attachments are used. So you may like to disable indexing of a particular type of document. To do this, you need to modify the confluence\WEB-INF\classes\plugins\attachment-extractors.xml and comment out the relevant file type extractor. From Confluence 2.6, attachment-extractors.xml is packaged inside confluence-2.6.0.jar; we have instructions for Editing files within .jar archives if you're unfamiliar with the process. The example below shows a pdfContentExtractor disabled which would cause PDF attachments not to be indexed.
<atlassian-plugin name='Attachment Extractors' key='confluence.extractors.attachments'> <plugin-info> <description>This library extracts searchable text from various attachment types.</description> <vendor name="Atlassian Software Systems" url="http://www.atlassian.com"/> <version>1.4</version> </plugin-info> <!-- <extractor name="PDF Content Extractor" key="pdfContentExtractor" class="com.atlassian.bonnie.search.extractor.PdfContentExtractor" priority="1100"> <description>Indexes contents of PDF files</description> </extractor> --> <extractor name="MS Word Content Extractor" key="msWordContentExtractor" class="com.atlassian.bonnie.search.extractor.MsWordContentExtractor" priority="1100"> <description>Indexes contents of Microsoft Word files</description> </extractor> <extractor name="MS Excel Content Extractor" key="msExcelContentExtractor" class="com.atlassian.bonnie.search.extractor.MsExcelContentExtractor" priority="1100"> <description>Indexes contents of Microsoft Excel files</description> </extractor> <extractor name="MS PowerPoint Content Extractor" key="msPowerpointContentExtractor" class="com.atlassian.bonnie.search.extractor.MsPowerpointContentExtractor" priority="1100"> <description>Indexes contents of Microsoft PowerPoint files</description> </extractor> </atlassian-plugin> |

Comments (5)
Feb 19
Anonymous says:
I'm sorry, but I really object to having to edit within the jar files provided b...I'm sorry, but I really object to having to edit within the jar files provided by Atlassian. It's a very unfriendly requirement for admin'ing the system.
I've seen some details in other similar how-tos on how an XML file can be provided in the file system that will override what is packed in the jar files. Please provde details for each of these reports on where a normal XML file can be placed in the file system to override these files in the jars..
Feb 20
Choy Li Tham says:
Hi, Another workaround that I can think of is to perform the steps below and se...Hi,
Another workaround that I can think of is to perform the steps below and see if it helps:
Hope the information does help.
Regards,
Choy Li
Mar 05
Anonymous says:
Atlassian, I strongly agree with the objection to having to unjar files just to...Atlassian,
I strongly agree with the objection to having to un-jar files just to edit them. Please put any files that have even the remote chance of being edited outside of the jar.
Thanks,
A concerned developer.
Mar 05
Anonymous says:
Choy, Is it possible to unjar the entire confluence2.x.x.jar file, and keep it ...Choy,
Is it possible to un-jar the entire confluence-2.x.x.jar file, and keep it un-jar'ed in the <confluence_install> dir.
If so, what are the exact directions (including any full file paths and any config changes) to do this.
Thanks in advance.
Mar 06
Choy Li Tham says:
Hi, To my understanding, if you prefer not to rejar the {{confluence2.x.x.jar}}...Hi,
To my understanding, if you prefer not to re-jar the confluence-2.x.x.jar file in the WEB-INF/lib/ directory, you will need to manually create a directory path in <confluence_install> accordingly and then place the edited file onto the directory created. For example:
Regards,
Choy Li
Add Comment