com.atlassian.bonnie.search.extractor.ExtractorException: Error reading content of Excel document

Still need help?

The Atlassian Community is here for you.

Ask the community

Symptoms

Warnings in the confluence log when indexing/reindexing a csv file attachment.

WARN [Indexer: 1] [bonnie.search.extractor.BaseAttachmentContentExtractor] addFields Error indexing attachment (Attachment: meta_mailinfo_sec01.csv v.1 (59509056) g6922)
com.atlassian.bonnie.search.extractor.ExtractorException: Error reading content of Excel document: Invalid header signature; read 0x6D453B74726F6853, expected 0xE11AB1A1E011CFD0
	at com.atlassian.confluence.extra.officeconnector.index.excel.ExcelTextExtractor.extractText(ExcelTextExtractor.java:103)
	at com.atlassian.bonnie.search.extractor.BaseAttachmentContentExtractor.addFields(BaseAttachmentContentExtractor.java:40)
	...

Cause

In Confluence versions before 3.5.11, when a user uploaded a csv file, it was marked as having Content-Type application/vnd.ms-excel, so the ExcelTextExtractor is used to index it. Since it's not a Excel file, ExcelTextExtractor cannot index it and will log an warning message.

This issue has been fixed in Confluence since version 3.5.11 so when user uploads new CSV files, those warning messages will not appear any longer. But for those old CSV files which still have the incorrect Content-type, when Confluence performs a re-index those warning messages still occur.

Resolution

Atlassian Support Offerings

The following approaches that involves SQL queries are beyond Atlassian Support Offerings. Please note that Atlassian does not support direct database INSERT, UPDATE or DELETE queries, as they can easily lead to data integrity problems. Atlassian will not be held liable for any errors or other unexpected events resulting from the use of the following SQL queries.

Backup your Database

Always backup your data before performing any modifications to the database.

  • Use the SQL script below (may need slight adjustment depending on the syntax of your DBMS) to correct the Content-Type of old csv files

    UPDATE CONTENTPROPERTIES
    SET STRINGVAL = 'text/csv'
    WHERE PROPERTYNAME = 'MEDIA_TYPE'
    AND PROPERTYID IN (
        SELECT PROPERTYID FROM CONTENTPROPERTIES
        WHERE CONTENTID IN (
            SELECT CONTENTID FROM CONTENT
            WHERE CONTENTTYPE = 'ATTACHMENT'
        AND (TITLE LIKE '%.csv' OR TITLE LIKE '%.CSV')
        )
    );
Last modified on Nov 2, 2018

Was this helpful?

Yes
No
Provide feedback about this article
Powered by Confluence and Scroll Viewport.