views:

73

answers:

1

Does someone know if Zend_Lucene class support CJK (Chinese Japanese Korean).

I want to use it on my own website the only problem it should work for both English and Japanese language.

Also if someone has some ressource about CJK version of the Java version would be appreciated also.

Thanks

+2  A: 

Currently, these are the only UTF-8 compatible analysers built into Zend_Lucene

  • Zend_Search_Lucene_Analysis_Analyzer_Common_Utf8
  • Zend_Search_Lucene_Analysis_Analyzer_Common_Utf8Num
  • Zend_Search_Lucene_Analysis_Analyzer_Common_Utf8_CaseInsensitive
  • Zend_Search_Lucene_Analysis_Analyzer_Common_Utf8Num_CaseInsensitive

You can use them by using the following code:

Zend_Search_Lucene_Analysis_Analyzer::setDefault(
new Zend_Search_Lucene_Analysis_Analyzer_Common_Text());

You can also build your own analyzer if you want.

An alternative solution would be to build the index using Java Lucene and use that index within PHP since they're supposed to be compatible. I haven't tried this though.

Zend_Search_Lucene was derived from the Apache Lucene project. The currently (starting from ZF 1.6) supported Lucene index format versions are 1.4 - 2.3

You can read more about this in the Zend Framework manual. link text

Mark Basmayor
Thanks for the answer, I was thinking using the java version for building the index, I don't really know if lucene play well with CJK but I'll try.
RageZ