views:

364

answers:

1

Good day,

Which lucene analyzer can be used to handle Japanese text properly? It should be able to handle Kanji, Hiragana, Katakana, Romaji, and any of their combination.

Thanks, Franz

+2  A: 

You should probably look at the CJK package that is in the contrib area of Lucene. There is an analyzer and a tokenizer specifically for dealing with Chinese, Japanese, and Korean.

adrianbanks
The CJK Analyzer seems to be a naive way of searching things, and from previous experience, does not seem to provide very relevant search results. Is there anything I need to do specifically to make CJK Analyzer work like modify some weights or something ?Thanks
Franz See
I've never used the CJK analyzer myself so cannot say. You could try asking on the Lucene mailing list (http://lucene.apache.org/java/docs/mailinglists.html#Java User List) for more specific help - there are people who are very experienced with Lucene on that list.
adrianbanks