Extractor plugins allow you to hook into the mechanism by which Confluence populates its search index. Each time content is created or updated in Confluence, it is passed through a chain of extractors that assemble the fields and data that will be added to the search index for that content. By writing your own extractor you can add information to the index. Extractor plugins can be used to extract the content from attachment types that Confluence does not support,
Extractor PluginsHere is an example atlassian-plugin.xml file containing a single search extractor: <atlassian-plugin name="Sample Extractor" key="confluence.extra.extractor"> ... <extractor name="Page Metadata Extractor" key="pageMetadataExtractor" class="confluence.extra.extractor.PageMetadataExtractor" priority="1000"> <description>Extracts certain keys from a page's metadata and adds them to the search index.</description> </extractor> ... </atlassian-plugin>
The Extractor InterfaceAll extractors must implement the following interface: package bucket.search.lucene; import bucket.search.Searchable; import org.apache.lucene.document.Document; public interface Extractor { public void addFields(Document document, StringBuffer defaultSearchableText, Searchable searchable); }
Attachment Content ExtractorsIf you are writing an extractor that indexes the contents of a particular attachment type (for example, OpenOffice documents or Flash files), you should extend the abstract class bucket.search.lucene.extractor.BaseAttachmentContentExtractor. This class ensures that only one attachment content extractor successfully runs against any file (you can manipulate the priorities of attachment content extractors to make sure they run in the right order). For more information, see: Attachment Content Extractor Plugins An Example ExtractorThe following example extractor is untested, but it associates a set of page-level properties with the page in the index, both as part of the regular searchable text, and also as Lucene Text fields that can be searched individually, for example in a custom {abstract-search} macro. package com.example.extras.extractor; import bucket.search.lucene.Extractor; import bucket.search.Searchable; import org.apache.lucene.document.Document; import org.apache.lucene.document.Field; import com.atlassian.confluence.core.ContentEntityObject; import com.atlassian.confluence.core.ContentPropertyManager; import com.opensymphony.util.TextUtils; public class ContentPropertyExtractor implements Extractor { public static final String[] INDEXABLE_PROPERTIES = {"status", "abstract"}; private ContentPropertyManager contentPropertyManager; public void addFields(Document document, StringBuffer defaultSearchableText, Searchable searchable) { if (searchable instanceof ContentEntityObject) { ContentEntityObject contentEntityObject = (ContentEntityObject) searchable; for (int i = 0; i < INDEXABLE_PROPERTIES.length; i++) { String key = INDEXABLE_PROPERTIES[i]; String value = contentPropertyManager.getStringProperty(contentEntityObject, key); if (TextUtils.stringSet(value)) { defaultSearchableText.append(value).append(" "); document.add(new Field(key, value, Field.Store.YES,Field.Index.TOKENIZED)); } } } } public void setContentPropertyManager(ContentPropertyManager contentPropertyManager) { this.contentPropertyManager = contentPropertyManager; } } DebuggingThere's a really primitive Lucene index browser hidden in Confluence which may help when debugging. You'll need to tell it the filesystem path to your $conf-home/index directory. |
Labels
Except where otherwise noted, content in this space is licensed under a Creative Commons Attribution 2.5 Australia License.

Comments (3)
Oct 19, 2006
Jim Clark says:
While the example above atlassian-plugin.xml does not have the plugin-info, be s...While the example above atlassian-plugin.xml does not have the plugin-info, be sure to include this element in the xml file, e.g.
Otherwise, bad things will happen!
Dec 19, 2007
Jim Dibble says:
What jar files do I need to include in my project in order to compile the (untes...What jar files do I need to include in my project in order to compile the (untested) sample above? I figured out that oscore-#.#.#.jar contains the OpenSymphony classes, and lucene-core-#.#.#-atlassian.jar contains the Lucene classes. But, which jars contain the bucket.search.lucene.Extractor, bucket.search.Searchable, and com.atlassian.confluence.core.* classes?
Mar 14
carl christensen says:
it's supposed to be import com.atlassian.bonnie.Searchable; import com.atlas...it's supposed to be
Add Comment