views:

316

answers:

1

I've been using nutch for a while,untile recently that I know about this resort.

How is its performance,and what's the file size limit it can support?

Besides,how to delete or update an index instead of re-index each time there is a modification?

+5  A: 

Zend_Search_Lucene is a pure PHP implementation of the Apache Lucene format. The currently (starting from ZF 1.6) supported Lucene index format versions are 1.4 - 2.3. For more information on Lucene, visit http://lucene.apache.org/java/docs/.

As far as index size limits are concerned, the index size is limited by 2GB for 32-bit platforms, and as far as I know, not limited on 64-bit platforms.

Performance is largely variant based to how you build your indexes. Make sure to check the section of the manual that deals with performance.

Also, Luke (a diagnostic tool for Lucene indexes) comes in really handy in performance optimization and troubleshooting.

P.S. With regards to updating, the Lucene index file format doesn't support document updating. Documents should be removed and re-added to the index to effectively update them. This is true for the Java implementation as well.

jason
Thank you for your comment.But nutch can merge new and old indexes into another one,why can't Lucene?Nutch is based on Lucene.BTW,are index files generated by nutch directly usable by Zend_Search_Lucene?
Shore
You can merge indexes with Lucene and Zend_Search_Lucene. You can also update _indexes_ themselves, like, adding a field for example. BUT, you cannot update a document IN an index.I think you are misunderstanding what Nutch is. Nutch is a search engine that uses Lucene for its indexes and searching. So yes, its indexes should be compatible.
jason
Wow,then I think I can do incremental indexing for Zend_Search_Lucene now.Thanks.
Shore
Hope it's performant enough.
Shore
+1 great input jason
Cal Jacobson