views:

3847

answers:

3

I'm looking for a stand-alone full-text search server with the following properties:

  • Must operate as a stand-alone server that can serve search requests from multiple clients
  • Must be able to do "bulk indexing" by indexing the result of an SQL query: say "SELECT id, text_to_index FROM documents;"
  • Must be free software and must run on Linux with MySQL as the database
  • Must be fast (rules out MySQL's internal full-text search)

The alternatives I've found that have these properties are:

  • SOLR (based on Lucene)
  • Sphinx

My questions:

  • Which one would you choose and why?
  • Have I missed any alternatives?
+7  A: 

Unless you need to extend the search functionality in any proprietary way, Sphinx is your best bet.

Sphinx advantages:

  1. Development and setup is faster
  2. Much better (and faster) aggregation. This was the killer feature for us.
  3. Not XML. This is what ultimately ruled out Solr for us. We had to return rather large result sets (think hundreds of results) and then aggregate them ourselves since Solr aggregation was lacking. The amount of time to serialize to and from XML just absolutely killed performance. For small results sets though, it was perfectly fine.
  4. Best documentation I've seen in an open source app

Solr advantages:

  1. Can be extended.
  2. Can hit it directly from a web app, i.e., you can have autocomplete-like searches hit the Solr server directly via AJAX.
Solr has many response writers other than xml, including JSON, PHP, Ruby, Python and a java binary format: http://lucene.apache.org/solr/api/org/apache/solr/request/QueryResponseWriter.html
Mauricio Scheffer
Did I mention how terrible the Solr/Lucene documentation is? Having to root through Javadocs to figure out functionality is not my idea of documentation.
I should have linked to the wiki: http://wiki.apache.org/solr/QueryResponseWriter#head-e82d899e83a861380fb6d0c34c1228a2f79f6c98
Mauricio Scheffer
+6  A: 

I have been using Sphinx for almost a year now, and it has been amazing. I can index 1.5 million documents in about a minute on my MacBook, and even quicker on the server. I am also using Sphinx to limit searches to places within specific latitudes & longitudes, and it is very fast. Also, how results are ranked is very tweakable. Easy to install & setup, if you read a tutorial or two. Almost 1.0 status, but their Release Candidates have been rock solid.

lo_fye
Geographical searching can be done in Solr with the LocalSolr plugin: http://www.gissearch.com/localsolr
Mauricio Scheffer
+38  A: 

I've been using Solr successfully for almost 2 years now, and have never used Sphinx, so I'm obviously biased. However, I'll try to keep it objective by quoting the docs or other people. I'll also take patches to my answer :-)

Similarities:

  • Both Solr and Sphinx satisfy all of your requirements. They're fast and designed to index and search large bodies of data efficiently.
  • Both have a long list of high-traffic sites using them (Solr, Sphinx)
  • Both offer commercial support. (Solr, Sphinx)
  • Both offer client API bindings for several platforms/languages (Sphinx, Solr)
  • Both can be distributed to increase speed and capacity (Sphinx, Solr)

Here are some differences:

Related questions:

Mauricio Scheffer
Stunning answer! +1
Artem Russakovskii