com.atlassian.bonnie.search.extractor.ExtractorException: Error reading content of Excel document
Symptoms
Warnings in the confluence log when indexing/reindexing a csv file attachment.
WARN [Indexer: 1] [bonnie.search.extractor.BaseAttachmentContentExtractor] addFields Error indexing attachment (Attachment: meta_mailinfo_sec01.csv v.1 (59509056) g6922)
com.atlassian.bonnie.search.extractor.ExtractorException: Error reading content of Excel document: Invalid header signature; read 0x6D453B74726F6853, expected 0xE11AB1A1E011CFD0
at com.atlassian.confluence.extra.officeconnector.index.excel.ExcelTextExtractor.extractText(ExcelTextExtractor.java:103)
at com.atlassian.bonnie.search.extractor.BaseAttachmentContentExtractor.addFields(BaseAttachmentContentExtractor.java:40)
...
Cause
In Confluence versions before 3.5.11, when a user uploaded a csv file, it was marked as having Content-Type application/vnd.ms-excel
, so the ExcelTextExtractor
is used to index it. Since it's not a Excel file, ExcelTextExtractor
cannot index it and will log an warning message.
This issue has been fixed in Confluence since version 3.5.11 so when user uploads new CSV files, those warning messages will not appear any longer. But for those old CSV files which still have the incorrect Content-type, when Confluence performs a re-index those warning messages still occur.
Resolution
Atlassian Support Offerings
The following approaches that involves SQL queries are beyond Atlassian Support Offerings. Please note that Atlassian does not support direct database INSERT, UPDATE or DELETE queries, as they can easily lead to data integrity problems. Atlassian will not be held liable for any errors or other unexpected events resulting from the use of the following SQL queries.
Backup your Database
Always backup your data before performing any modifications to the database.
Use the SQL script below (may need slight adjustment depending on the syntax of your DBMS) to correct the Content-Type of old csv files
UPDATE CONTENTPROPERTIES SET STRINGVAL = 'text/csv' WHERE PROPERTYNAME = 'MEDIA_TYPE' AND PROPERTYID IN ( SELECT PROPERTYID FROM CONTENTPROPERTIES WHERE CONTENTID IN ( SELECT CONTENTID FROM CONTENT WHERE CONTENTTYPE = 'ATTACHMENT' AND (TITLE LIKE '%.csv' OR TITLE LIKE '%.CSV') ) );