'MalformedInputException' when rebuilding the Did You Mean Search Index
The following error is thrown when trying to rebuild the Did You Mean search index in Confluence:
[6/29/09 23:39:13:673 CDT] 0000002e SystemOut O 2009-06-29 23:39:13,671 ERROR [Did-You-Mean-Index-Build-Thread] [search.didyoumean.lucene.FullIndexBuilder] indexWordsFromBundledDictionary Error reading from bundled dictionary file: words.zip. -- referer: http://localhost:9080/confluence/admin/search-indexes.action | url: /confluence/admin/didyoumean/build-index.action | userName: admin | action: build-index sun.io.MalformedInputException at sun.io.ByteToCharUTF8.convert(ByteToCharUTF8.java:278) at sun.nio.cs.StreamDecoder$ConverterSD.convertInto(StreamDecoder.java:314) at sun.nio.cs.StreamDecoder$ConverterSD.implRead(StreamDecoder.java:364) at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:250) at java.io.InputStreamReader.read(InputStreamReader.java:212) at java.io.BufferedReader.fill(BufferedReader.java:157) at java.io.BufferedReader.readLine(BufferedReader.java:320) at java.io.BufferedReader.readLine(BufferedReader.java:383) at com.atlassian.confluence.search.didyoumean.lucene.FullIndexBuilder.indexWordsFromBundledDictionary(FullIndexBuilder.java:188) at com.atlassian.confluence.search.didyoumean.lucene.FullIndexBuilder.build(FullIndexBuilder.java:92) at com.atlassian.confluence.search.didyoumean.BuildIndexTask.run(BuildIndexTask.java:39)
The admin screen however displays that the index was rebuilt successfully in 0 seconds but the did you mean search suggestions are not available.
If your system locale is using a UTF-8 encoding, some SDK tools might throw a sun.io.MalformedInputException. To find out whether your system is using a UTF-8 encoding, examine the locale-specific environment variables such as LANG or LC_ALL to see if they end with the suffix ".UTF-8". If you get this sun.io.MalformedInputException, change characters that are not within the 7-bit ASCII range (0x00 - 0x7f) and are not represented as Java Unicode character literals to Java Unicode character literals (for example: '\u0080'). You can also work around this problem by removing the ".UTF-8" suffix from the locale-specific environment variables; for example, if your machine has default locale of "en_US.UTF-8", set LANG to "en_US".
Some distributions of Red Hat, including Red Hat 9 and RHEL3, use UTF-8 encoding by default.
Remove the ".UTF-8" suffix from the Operating System's locale specific environment variable (see above).
Please bear in mind that IBM JDK is not included in Supported Platform. If it caused a performance problem, please migrate to the supported Java, which is Oracle JDK, and we would like to recommend you to use Java 1.6.0_26 onwards. Please refer to: Installing Java for Confluence.