Currently, these are the only UTF-8 compatible analysers built into Zend_Lucene
- Zend_Search_Lucene_Analysis_Analyzer_Common_Utf8
- Zend_Search_Lucene_Analysis_Analyzer_Common_Utf8Num
- Zend_Search_Lucene_Analysis_Analyzer_Common_Utf8_CaseInsensitive
- Zend_Search_Lucene_Analysis_Analyzer_Common_Utf8Num_CaseInsensitive
You can use them by using the following code:
Zend_Search_Lucene_Analysis_Analyzer::setDefault(
new Zend_Search_Lucene_Analysis_Analyzer_Common_Text());
You can also build your own analyzer if you want.
An alternative solution would be to build the index using Java Lucene and use that index within PHP since they're supposed to be compatible. I haven't tried this though.
Zend_Search_Lucene was derived from the Apache Lucene project. The currently (starting from ZF 1.6) supported Lucene index format versions are 1.4 - 2.3
You can read more about this in the Zend Framework manual.
link text