Search the Confluence 4.1.x Documentation:

Index
Downloads (PDF, HTML & XML formats)
Other versions

This documentation relates to Confluence 4.1.x
If you are using an earlier version, please view the previous versions of the Confluence documentation and select the relevant version.
Skip to end of metadata
Go to start of metadata

When searching for content based on search terms entered by the user, Confluence splits the text of the content into tokens, and then filters and modifies those tokens according to the following rules.

Tokenisation

Confluence uses Lucene's Standard Tokenizer. This splits the text into tokens as follows:

  • Splits words at punctuation characters, removing punctuation. However, a dot that's not followed by white space is considered part of a token.
  • Splits words at hyphens, unless there's a number in the token, in which case the whole token is interpreted as a product number and is not split.
  • Recognises email addresses and internet host names as one token.

(info) An example: The string 'foo-bar5' won't be split into 'foo' and 'bar5', so a search for 'bar5' or 'bar*' will not find any results.

Filtering

Confluence then:

  • Removes "'s" from the ends of words.
  • Removes the dots from acronyms, e.g. I.B.M. becomes IBM.
  • Converts everything to lower case.
  • Removes common words like 'the' and 'or' are removed.
  • Converts words to their stems. For example, 'fishing' and 'fishes' both become 'fish'.

Related Topics

Searching Confluence

Labels:
  1. May 18, 2007

    We are probing Confluence to maintain Russian data. So almost all works fine with Russian language. Almost, but not all.

    I go to the Confluence admin menu, and set Administration->General Configuration->Indexing Language to be Russian. Yes! Morphology of Russian language starts to work!

    But one could not find digits by Confluence standard context search: e.g. queries like 2 16 2000 won't work - just an empty result is returned.

    Then, I switch Administration->General Configuration->Indexing Language to be english - wow, digits found in docements! But Russian morphology won't work (sad) Then I switch back to Russian - Russian morphology works, digits won't work. Why? What's the matter?