Extractor plugin modules are available in Confluence 1.4 and later versions |
Extractor plugins allow you to hook into the mechanism by which Confluence populates its search index. |
Extractor plugins can be used to extract the content from attachment types that Confluence does not support,
Confluence's internal search is built on top of the Lucene Java library. While familiarity with Lucene is not an absolute requirement for writing an extractor plugin, you'll need it to write anything more than the most basic of plugins. |
Here is an example atlassian-plugin.xml file containing a single search extractor:
<atlassian-plugin name="Sample Extractor" key="confluence.extra.extractor">
...
<extractor name="Page Metadata Extractor" key="pageMetadataExtractor"
class="confluence.extra.extractor.PageMetadataExtractor" priority="1000">
<description>Extracts certain keys from a page's metadata and adds them to the search index.</description>
</extractor>
...
</atlassian-plugin>
|
bucket.search.lucene.ExtractorAs a general rule, all extractors should have priorities below 1000, unless you are writing an extractor for a new attachment type, in which case it should be greater than 1000. If you are not sure what priority to choose, just go with To see the priorities of the extractors that are built into Confluence, look in |
All extractors must implement the following interface:
package bucket.search.lucene;
import bucket.search.Searchable;
import org.apache.lucene.document.Document;
public interface Extractor
{
public void addFields(Document document, StringBuffer defaultSearchableText, Searchable searchable);
}
|
document parameter is the Lucene document that will be added to the search index for the object that is being saved. You can add fields to this document, and the fields will be associated with the object in the index.defaultSearchableText is the main body of text that is associated with this object in the search index. It is stored in the index as a Text field with the key "content". If you want to add text to the index such that the object can be found by a regular Confluence site search, append it to the defaultSearchableText. (Remember to also append a trailing space, or you'll confuse the next piece of text that's added!)searchable is the object that is being saved, and passed through the extractor chain.If you are writing an extractor that indexes the contents of a particular attachment type (for example, OpenOffice documents or Flash files), you should extend the abstract class bucket.search.lucene.extractor.BaseAttachmentContentExtractor. This class ensures that only one attachment content extractor successfully runs against any file (you can manipulate the priorities of attachment content extractors to make sure they run in the right order).
For more information, see: Attachment Content Extractor Plugins
The following example extractor is untested, but it associates a set of page-level properties with the page in the index, both as part of the regular searchable text, and also as Lucene Text fields that can be searched individually, for example in a custom {abstract-search} macro.
package com.example.extras.extractor;
import bucket.search.lucene.Extractor;
import bucket.search.Searchable;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import com.atlassian.confluence.core.ContentEntityObject;
import com.atlassian.confluence.core.ContentPropertyManager;
import com.opensymphony.util.TextUtils;
public class ContentPropertyExtractor implements Extractor
{
public static final String[] INDEXABLE_PROPERTIES = {"status", "abstract"};
private ContentPropertyManager contentPropertyManager;
public void addFields(Document document, StringBuffer defaultSearchableText, Searchable searchable)
{
if (searchable instanceof ContentEntityObject)
{
ContentEntityObject contentEntityObject = (ContentEntityObject) searchable;
for (int i = 0; i < INDEXABLE_PROPERTIES.length; i++)
{
String key = INDEXABLE_PROPERTIES[i];
String value = contentPropertyManager.getStringProperty(contentEntityObject, key);
if (TextUtils.stringSet(value))
{
defaultSearchableText.append(value).append(" ");
document.add(Field.Text(key, value));
}
}
}
}
public void setContentPropertyManager(ContentPropertyManager contentPropertyManager)
{
this.contentPropertyManager = contentPropertyManager;
}
}
|
There's a really primitive Lucene index browser hidden in Confluence which may help when debugging. You'll need to tell it the filesystem path to your $conf-home/index directory.
http://yourwiki.example.com/admin/indexbrowser.jsp