views:

1271

answers:

5

We're currently running MySQL on a LAMP stack and have been looking at implementing a more thorough, full-text search on our site.

We've looked at MySQL's own freetext search, but it doesn't seem to cope well with large databases, which makes it far too slow for our needs.

Our main requirements are:

  • speed returning results
  • simple updating of index

In addition to the above, our "nice to have"s are:

  • ideally not something that requires adding a module to MySQL
  • plays nicely with PHP (majority of our dev work done using PHP)

There seems to be quite a few healthy open-source projects to add fast, reliable full-text search to MySQL, so I'm basically looking for recommendations/suggestions on what you've found to be the most useful product out there, easiest to set up, etc.

So far, the list of ones we've been starting to play around with are:

Are there any better ones out there that we haven't come across yet? Can you recommend / suggest against any of the options we've gathered so far?

Thanks for your help!

Update

@Cletus suggested Google's Custom Search Engine. We recently trialled this on a couple of projects, and it's an almost-perfect fit for our needs. The problem is that entries on our site are updated quite regularly, and unfortunately the speed at which entries go in/get updated in Google's index was just too slow and erratic for us to rely on, even with the addition of sitemaps and requested crawl rate changes.

+1  A: 

Is your site public? If so the lowest barrier to entry would probably be Google Custom Search Engine.

cletus
We've tried google's CSE on a couple of projects and it's a good fit, but we've a frequently-updated site, and the index is just too slow to update for us to rely on. Question updated to clarify, cheers!
ConroyP
+6  A: 

I recommend Lucene, which handles a large site nicely, and should be able to work with frequent updates. It does not work out of the box, however, and if Solr suits your needs, it is probably easier to work with it. Suppose you choose Lucene, the next choice is using Java Lucene or a PHP port. If you choose the later, Check out the Zend Lucene port.

Yuval F
nothing works best than Solr and a Solr PHP client available on code.google.com
A: 

There is also Flax, based on Xapian, but it's not natively MySQL. It should scale great, but it probably requires a fair amount of engineering to create a crawler for your needs.

Andrei Taranchenko
+4  A: 

Sphinx is super-fast for indexing, but it has the limitation that you'd need a delta-index setup to track recent changes between full indexes. Not ideal, but depends on the API/plugin you use to manage it - I've only used it via Ruby.

That really is the only complaint I can think of for Sphinx.

pat
+3  A: 

I needed full-text search for a PHP project backed by a postgres database. At first, I went with Zend Lucene, a PHP port, due to the ease of installation and implementation.

We quickly discovered that Zend's port falls apart once your index gets larger than about 100 mb. Simple queries were taking 15+ seconds of CPU time, and hundreds of megs of RAM.

We replaced Zend with Solr; Solr is able to answer queries against the same index in under 10 ms. Since you can add documents to the index using the XML interface, submit queries via a REST-ful API, and get results back in PHP's native serialized format, I was able to plug it in in place of Zend in a matter of hours.

All told, Solr offers far more functionality, and performance 50 times (indexing) to 1000 times (complex queries) better across the board.

Installation isn't bad; installing the Sun JRE was the hardest part. Solr ships with a ready-to-go demo installation (using Jetty as a servlet container) that can actually handle moderate production loads. Just edit schema.xml, make sure to configure Jetty to bind to a non-public IP, execute "java start.jar", and you're live.

Frank Farmer