'MalformedInputException' when rebuilding the Did You Mean Search Index

Still need help?

The Atlassian Community is here for you.

Ask the community


The following error is thrown when trying to rebuild the Did You Mean search index in Confluence:

 [6/29/09 23:39:13:673 CDT] 0000002e SystemOut     O 2009-06-29 23:39:13,671 ERROR [Did-You-Mean-Index-Build-Thread] [search.didyoumean.lucene.FullIndexBuilder] indexWordsFromBundledDictionary Error reading from bundled dictionary file: words.zip.
 -- referer: http://localhost:9080/confluence/admin/search-indexes.action | url: /confluence/admin/didyoumean/build-index.action | userName: admin | action: build-index
	at sun.io.ByteToCharUTF8.convert(ByteToCharUTF8.java:278)
	at sun.nio.cs.StreamDecoder$ConverterSD.convertInto(StreamDecoder.java:314)
	at sun.nio.cs.StreamDecoder$ConverterSD.implRead(StreamDecoder.java:364)
	at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:250)
	at java.io.InputStreamReader.read(InputStreamReader.java:212)
	at java.io.BufferedReader.fill(BufferedReader.java:157)
	at java.io.BufferedReader.readLine(BufferedReader.java:320)
	at java.io.BufferedReader.readLine(BufferedReader.java:383)
	at com.atlassian.confluence.search.didyoumean.lucene.FullIndexBuilder.indexWordsFromBundledDictionary(FullIndexBuilder.java:188)
	at com.atlassian.confluence.search.didyoumean.lucene.FullIndexBuilder.build(FullIndexBuilder.java:92)
	at com.atlassian.confluence.search.didyoumean.BuildIndexTask.run(BuildIndexTask.java:39)

The admin screen however displays that the index was rebuilt successfully in 0 seconds but the did you mean search suggestions are not available.


From http://www.ibm.com/developerworks/java/jdk/linux/142/runtimeguide.lnx.en.html:

If your system locale is using a UTF-8 encoding, some SDK tools might throw a sun.io.MalformedInputException. To find out whether your system is using a UTF-8 encoding, examine the locale-specific environment variables such as LANG or LC_ALL to see if they end with the suffix ".UTF-8". If you get this sun.io.MalformedInputException, change characters that are not within the 7-bit ASCII range (0x00 - 0x7f) and are not represented as Java Unicode character literals to Java Unicode character literals (for example: '\u0080'). You can also work around this problem by removing the ".UTF-8" suffix from the locale-specific environment variables; for example, if your machine has default locale of "en_US.UTF-8", set LANG to "en_US".
Some distributions of Red Hat, including Red Hat 9 and RHEL3, use UTF-8 encoding by default.


Remove the ".UTF-8" suffix from the Operating System's locale specific environment variable (see above).

Please bear in mind that IBM JDK is not included in Supported Platform. If it caused a performance problem, please migrate to the supported Java, which is Oracle JDK, and we would like to recommend you to use Java 1.6.0_26 onwards. Please refer to: Installing Java for Confluence.














Last modified on Nov 2, 2018

Was this helpful?

Provide feedback about this article
Powered by Confluence and Scroll Viewport.