tags:

views:

885

answers:

4

Lucene has quite poor support for Russian language.

RussianAnalyzer (part of lucene-contrib) is of very low quality.

RussianStemmer module for Snowball is even worse. It does not recognize Russian text in Unicode strings, apparently assuming that some bizarre mix of Unicode and KOI8-R must be used instead.

Do you know any better solutions?

+1  A: 

If all else fails, use Sphinx

squadette
+2  A: 

That's the beauty of open source. You have the source code, so if the current implementations don't work for you, you can always create your own or even better, extend the existing ones. A good start would be the "Lucene in Action" book.

MrM
+3  A: 

What needs to be done in RussianAnalyzer to improve its quality? I have developed it, and I'm open to suggestions, so let's talk.

+1  A: 

My answer is probably too late, but for the record, I've found analyzers from AOT project much better then those shipped with Lucene.

spariev