This documentation relates to an earlier version of Confluence.
View this page in the current documentation or visit the current documentation home.
Skip to end of metadata
Go to start of metadata

Extractor plugin modules are available in Confluence 1.4 and later

Attachment content extractor plugins enable Confluence to index the contents of attachments that it may not otherwise understand. Before you read this document, you should be familiar with Extractor Plugins.

The BaseAttachmentContentExtractor class

Attachment content extractor plugins must extend the bucket.search.lucene.extractor.BaseAttachmentContentExtractor base class. The skeleton of this class is:

The first attachment content extractor that returns true from shouldExtractFrom, and a not-null, not-empty String from extractText() will cause all remaining attachment content extractors not to run against this file. Thus, it's important to get the priority value for your plugin right, so general, but inaccurate extractors are set to run after specific, more accurate extractors.

Other (non-attachment) content extractors will still run, regardless.

An Example

This is an example of a hypothetical extractor that extracts the contents of mp3 ID3 tags.