views:

3085

answers:

2

This is admittedly similar to (but not a duplicate of) http://stackoverflow.com/questions/737275/pros-cons-of-full-text-search-engine-lucene-sphinx-postgresql-full-text-search, however what I am looking for are specific, supported, recommendations from the benefit of experience with more than one of the available systems (there seems to be a lot of: "I've used lucene, but not sphinx", and vice a versa).

The setup: Standard LAMP (Mysql 5.0, PHP 5).

MySQL: tables are using the InnoDB engine for foreign key constraints

We are looking at indexing data, not pages. data to be indexed may be in multiple languages (utf-8 charset)

A number of the comparisons I've come across (like http://blog.evanweaver.com/articles/2008/03/17/rails-search-benchmarks/) are either not entirely applicable (ferret is a lucene port but not the same as Zend_Search_Lucene) or they are pushing their own systems/implementations (not exactly unbiased).

Some others I've come across (such as http://whatstheplot.com/blog/tag/lucene/ and http://pagetracer.com/2008/02/15/sphinx-and-lucene-search-engines-first-impressions/) provide very different results for performance of the two systems.

Also, all but ignored in much of what I've read is Xapian. Might this be worth consideration as well?

So... I'm hoping that some of you here on SO have some experience with this question and could help with some recommendations or point me in the right direction.

+6  A: 

One advantage of sphinx is that you can "interpose" it between your clients and the MySql server, and it will only "interfere" on queries specifically addressing it, transparently bouncing the others off MySQL -- see e.g this URL. Whether that's an advantage in your use case, you're best placed to say! Sorry, no real-life experience w/Xapian or Lucene -- still, reading about how to deploy them, makes it sound like (to me!-) as if it might be worth it only if you identified substantial advantages... otherwise, sphinx's "easy as pie" deployment, as a "proxy" between your clients and your MySQL server, feels like a big, substantial win to me!-)

Alex Martelli
Sphinx appears to have a lot of advantages, but given that lucene has some rather vocal advocates, I was hoping to hear from some folks with experience with both
Jonathan Fingland
In the end I went ahead and tested sphinx with the intention of testing Lucene (and perhaps Xapian) however, honestly, it integrated so smoothly with a PHP/MySQL setup that I'd find it hard to justify spending the time on the others.
Jonathan Fingland
+1  A: 

I looked at Zend_Search_Lucene and Sphinx for a project that sounds similar - searching database content (in my case, book information). I spent about a day looking at each. For what it's worth, I found Sphinx vastly easier to set up and use.

+1 thanks for the insight. I've been pleasantly surprised with Sphinx so far, it has made the integration of search far easier than I expected. Once the db has more data will we be able to know more. I haven't tried delta index merges yet, but hopefully they'll be as easy to implement as they look
Jonathan Fingland