OpenSearch upgrade guide

Preparing for Confluence 8.7

On this page

Still need help?

The Atlassian Community is here for you.

Ask the community

What's changing

We're making preparations to introduce OpenSearch as an opt-in feature in the next Confluence Data Center platform release. If you’re using the Lucene API independently, consider moving to the Confluence search v2 API to minimize disruption to customers who take advantage of the OpenSearch engine capabilities. We will maintain compatibility with our v2 search API as much as possible, so most current integrations should not cause any issues.

To ensure a smooth rollout, we’ll update this page with any details that may affect you.

Testing your plugins with OpenSearch

OpenSearch integration is available as a preview feature starting from version 8.9. Test your plugins to make sure they work as expected with OpenSearch. Refer to the Configuring OpenSearch with Confluence guide to get started.

Note that as a preview feature, it currently has some limitations:

  • lucIt only supports English as the indexing language.

  • It does not come with health-checks.

The feature will be officially released to general customers in the near future.

Deprecations and changes

Index fields must be consistent across documents

We use FieldDescriptor to add a value as a field (for example, filename) to an indexed document, typically from an Extractor2. Each FieldDescriptor is associated with a mapping which describes how the field is indexed and queried; for example, its type (string vs text vs long) and its analyzer.

Previously, you could add a particular field name with differing mappings between documents in the same index. For instance, you could add filename as text in one document, but as string in another.

What’s changing: After rollout, mixing field mappings will create an error message in your log file. The error message will look like the following:

Mapping for 'filename' (TextFieldMapping {name='filename'}) conflicts with existing mapping (StringFieldMapping {name='filename'})

This error message does not currently result in any adverse effects on Lucene. However, this would cause a problem in OpenSearch: this conflicting mapping means that the field may be indexed incorrectly, or the whole document may fail to be indexed altogether.

When: Confluence 8.8

What’s new: To fix this, you should either make the field use the same mapping across all documents of an index, or use different names; for example, filename.text and filename.string.

To better enforce this in your code, we recommend you use FieldMapping to declare your fields explicitly. For example, instead of this deprecated code:

// Constant public static final String FILENAME = "filename"; // Extractor A fields.add(new TextFieldDescription(FILENAME, docA.getName(), Stored.YES, new FilenameAnalyzerDescriptor())); // Extractor B fields.add(new TextFieldDescription(FILENAME, docB.getName(), Stored.YES, new FilenameAnalyzerDescriptor()));

write this code:

// Constant public class MyFields implements FieldMappingsProvider { public static final TextFieldMapping FILENAME = TextFieldMapping.builder("filename") .store(true) .analyzer(new FilenameAnalyzerDescriptor()) .build(); @Override public Collection<FieldMapping> getFieldMappings() { return List.of(FILENAME); } } // Extractor A fields.add(MyFields.FILENAME.createField(docA.getName())); // Extractor B fields.add(MyFields.FILENAME.createField(docB.getName()));

It’s best practice to explicitly register your mappings with FieldMappingsProvider in atlassian-plugin.xml so that they get created on the OpenSearch index when your plugin starts up. Alternatively, Confluence will create them dynamically when you index a document with those mappings.

<field-mappings-provider key="my-custom-fields" index="CONTENT" class="com.example.MyFields" />

AnalyzerDescriptor to be deprecated

Previously, AnalyzerDescriptor could be used to build a bespoke analyzer by specifying an arbitrary combination of TokenizerDescriptor, CharFilterDescriptors, and TokenFilterDescriptors. This analyzer could then be used for indexing (for example, on TextFieldDescriptor) and for querying (for example, on PhraseQuery).

What’s changing: Bespoke analyzers defined with AnalyzerDescriptor has been deprecated, and will not work with OpenSearch.

When: Confluence 8.7

What’s new: On OpenSearch, you can only use predefined analyzers provided by Confluence (i.e. not AnalyzerDescriptor).

Explore the current list of supported predfined analyzers in Confluence in the All Current Implementing Classes section of MappingAnalyzerDescriptor.

The contentBody field is no longer stored

In the Confluence content index, the contentBody field holds the indexed content of a document so it can be queried.

Previously, this field was stored, meaning it could be used to fetch the original value. For example, you could include it in the requestedField parameter of the search or scan method.

Whats changing: contentBody is no longer a stored field.

When: Confluence 8.7

Whats New:  Instead, the original value of a document’s content will be stored in a new, separate field, called contentBody-stored. If you currently fetch document content using the contentBody field, use contentBody-stored instead.
Note that the contentBody field is still indexed and used for querying (for example, via CQL contentBody:foo).

SearchIndex to be deprecated and replaced with Index

SearchIndex is an enum that is useful only for system indices, such as CONTENT or CHANGE. Previously, a plugin had to use an index name to manage its own custom index. It would be cumbersome when the search platform is OpenSearch because the enum value would need to be translated into a real name of the index stored on OpenSearch whenever a request is built.

What’s changing: SearchIndex has been deprecated, and is replaced by Index .

When: Confluence 8.7

What’s new: Index can be used for both system indices and custom indices. Using this new class will give us some benefits:

  • A consistent way to interact with an index, whether it's a system or custom index, and whether the search platform is Lucene or OpenSearch.

  • Index name abstraction, which makes code neater, and easier to maintain.

Sorting changes and updates

TextFieldMapping

Lucene allows you to sort search queries on a text field (i.e. TextFieldMapping); however, these requests were problematic because fields were tokenized and can be text heavy, leading to inaccurate and inefficient results. It’s recommended instead to search on a keyword field (i.e. StringFieldMapping).

What’s changing: OpenSearch doesn’t allow sorting on text fields, so such operation will now result in the following error:

Text fields are not optimised for operations that require per-document field data like aggregations and sorting, so these operations are disabled by default. Please use a keyword field instead.

When: Confluence 8.9

What’s new: A workaround is to create the textual field as a keyword, i.e. StringFieldMapping.

UserAttributeSort

Previously, Lucene allowed sort requests by user attributes (created by UserAttributeSort ). This could be resource intensive because user details had to be fetched separately for each document found by search queries before sorting.

What’s changing: The class UserAttributeSort has been deprecated, and is not supported on OpenSearch.

When: Confluence 8.9

What’s new: There is no workaround at this stage.

LowercaseFieldSort

The class LowercaseFieldSort allows sorting based on a lower case value of a keyword field.

What’s changing: OpenSearch doesn’t support lowercase sort natively; therefore, to implement LowercaseFieldSort efficiently, changes have been added to classes StringFieldMapping and LowercaseFieldSort.

When: Confluence 8.9

What’s new:

StringFieldMapping

We’ve introduced two new properties onto StringFieldMapping which are only relevant for OpenSearch. Please note that these changes will only take effect if Confluence is configured to use OpenSearch.

  • asLowercase if true:

    • Index phase: no impact.

    • Search phase: this setting indicates that the field is already stored as lowercase, hence Confluence will use the field as is to sort result.

  • withLowercase if true:

    • Index phase: a sub-field will be created to store a lowercase version of the original field.

    • Search phase: the sub-field will be used instead of the original field for sorting.

LowercaseFieldSort

We’ve introduced a new constructor to the LowercaseFieldSort class which takes a StringFieldMapping argument, instead of just field name, in addition to the sort order. This argument will tell Confluence how to handle lowercase sort in OpenSearch on the field, based on how it stores the lowercase value.

For backward compatibility, Confluence will fallback to use a script sort version of OpenSearch in these scenarios:

  • LowercaseFieldSort is not constructed with StringFieldMapping.

  • LowercaseFieldSort is constructed with StringFieldMapping, but the field mapping has both asLowercase and withLowercase as false.

Due to this fallback adversely impacting performance, we recommend that you update existing codes using this sort to construct: LowercaseFieldSort with appropriate StringFieldMapping.

For detection purposes, Confluence will output the following warning log whenever this fallback is used:

Using script sort for field: {field_name}. This will significantly impact query performance. Please consider migrating the field to support lowercase sub field for better performance

Result window limit

You can paginate your search window by defining the limit and offset on the ISearch object. The larger these numbers are (limit + offset), the more index documents the search engine will need to shift through, known as the “result window”, which corresponds proportionally with memory utilisation. For example, returning the thousandth page of a search (at 20 results per page) would require significantly more resources than returning the first page.

In Lucene, there are currently no limits to this result window.

What’s changing: By default, OpenSearch has a limit of a 10,000 result window, i.e. the amount of results that can be requested on searches. Requests that exceed this limit will be rejected, and you’ll get an error that looks like the following:

Result window is too large, from + size must be less than or equal to: [10000] but was [10010]. See the scroll api for a more efficient way to request large data sets. This limit can be set by changing the [index.max_result_window] index level setting.

When: Confluence 8.9

What’s new: In order for your search query to work on OpenSearch, you will need to pass a small number as a limit. You’ll also need to restrict the number of pages that users can navigate to; in other words, keep offset small.

If you need to iterate through a large number of data, consider using the SearchManager.scan method.

Alternatively, if your search is sorted by field(s) (for example, modified, _id), OpenSearch also provides a way to paginate your search efficiently using searchAfter (instead of offset), which is unaffected by result window limits. However, this parameter is not supported on Lucene.

Explore OpenSearch documentation on search_after.

Tips

Rebuilding custom indices

When a plugin maintains a custom index, we recommend that you create a listener for the ReIndexRequestEvent This listener will rebuild the custom index, which ensures that when an admin requests to rebuild the index (via the Content Indexing admin page), all indices will get rebuilt including custom ones within plugins.

Last modified on Apr 2, 2024

Was this helpful?

Yes
No
Provide feedback about this article
Powered by Confluence and Scroll Viewport.