views:

4864

answers:

7

We need to integrate a search engine in our Product Catalog management software. the catalog is expected to have more than 4-5 mn. records with relational data spread over several tables. Our dev platform is Asp.Net 3.5 and we have done some pre-liminary work on Lucene, found it to be good. However, we just came to know of Solr and was looking for some practical tips to compare Lucene & Solr from implementation, timeline, regular maintenance, performance, features perspective. Any guidance or pointers would be really helpful. Thanks.

+7  A: 

Lucene:

Apache Lucene is a high-performance, full-featured text search engine library written entirely in Java. It is a technology suitable for nearly any application that requires full-text search

Solr:

Solr is an open source enterprise search server based on the Lucene Java search library, with XML/HTTP and JSON APIs, hit highlighting, faceted search, caching, replication, a web administration interface and ...

Essentially, Lucene is embedded in Solr and is purely a full-text search library, with the purpose of being embedded into projects giving them full-text search capabilities. Solr has much more features and administration capabilities, allowing to search structured data without needing to write any custom code, load data from CSV files, tolerant parsing of user input, faceted searching, highlighting matched text in results, and retrieving search results in a variety of formats (XML, JSON, ...) . Check Solr features page and see if any feature is relevant for your project.

dcruz
i have created my indexes using Lucene. can those indexes still be used by Solr for search queries ?
Vikram
As in most of the cases, it dependes. It isn't automatic, you have to be sure that solr has the same fields mapping that those in the Lucene indexes. For further information, check: http://www.nabble.com/Using-Lucene-index-in-Solr-td4983079.html
dcruz
@dcruz, by any chance do you have any experience with DataImportHandler in Solr which can automatically import the data from database based on some config files. Does it works as smooth as it sounds or are there any gotchas hidden ?
Vikram
Sorry =( i worked with Solr two years ago and i don't really remember implementation details.
dcruz
A: 

We are exactly in the same situation as you are. Unfortunately I was not directly involved in the evaluation process, but at the end we're going to use Solr integrated with Lucene.

The main advantage is the variety of formats as dcruz described. So you can query your Solr-Consumer and get back your search result as XML data which can be easily parsed and displayed on the webpage.

Juri
+8  A: 

I created this quick summary of Solr VS Sphinx a few days ago, based mostly on StackOverflow answers: http://beerpla.net/2009/09/03/comparison-between-solr-and-sphinx-search-servers-solr-vs-sphinx-fight. It should answer at least some of your questions.

P.S. My vote is for Solr.

Artem Russakovskii
+1 for compiling the comparison.
Vikram
+1  A: 

Like dcruz says, Solr uses Lucene anyway, so it's not a valid comparison.

Lucene is a toolkit for building search apps, Solr is a search app built with Lucene.

IMO you'd be crazy not to use Solr, as it provides you with a lot of 'plumbing' that you'd have to write yourself otherwise -- like a configurable Data Import Handler to suck data out of your RDBMS or XML repositories.

Plus it gives you a web admin interface and other bells and whistles.

Andrew Clegg
+2  A: 

I have to agree with Andrew Clegg. I think when a lot of Java Developer types look at Lucene vs Solr, Lucene looks more friendly because it's a just a library (POJJ: Plain Old Java Jar!), like any other library and it looks straightforward to embed, versus the complexity of standing Solr up as a separate process that communicates over complex HTTP.

However, I think that for almost all search use cases, Solr is the right approach. Because most of the complexity in Search is not the direct initial integration, but in the fuzzy areas of tuning searches, scaling to meet demand, and maintaining your indexes that cross over from the developer centric world to being in the systems world. And Solr handles all of those needs nicely.

Eric Pugh
just ordered your book - Solr 1.4 Enterprise Search
Vikram
Glad to hear it! Let me know how you like it!
Eric Pugh
@Eric, in the meantime, can you possibly me to some kind of cheat sheet for DataImportHandler which can be configured to import the data from a SQL server database.
Vikram
@Vikram, I just saw your comment, did the book cover DIH the way you needed it to?
Eric Pugh
unfortunately, i have not received the book as yet. it should be arriving any day now...
Vikram
+1  A: 

Let me shift your focus a bit: are you prepared to changes in architecture of you product? Both Lucene and Solr are implemented in Java. So you will end up running yet another web-container for hosting it (and hence will lose platform purity so to say). While Lucene was ported to .NET (Lucene.NET project), Solr was not as far as I know. If you happen to use SQL Server (which is likely, considering you platform), you might consider SQL Server Full-Text Search instead - it has almost the same features (not so feature-rich as Lucene/Solr, but anyway) and usually (in most cases) is much easier to incorporate into existing application. Besides that you benefit from simplified maintenance (it comes together with you database) and staying within single platform as well.

AlexS
SQL Server FTS is *way* behind Lucene and Solr
Mauricio Scheffer
I was not saying that it is on par. But using SQL Server FTS will let you deliver the solution faster/easier and you will be staying in the boundaries of the platform. A while ago we were faced the same choice: either staying with SQL Server FTS or start using Solr. We ended up with Solr and that's why I can compare both features and the effort required to get them into your app. But everyone makes its' own decision anyway.
AlexS
@Alex, did you use DataImportHandler for configuring data importing into Solr from SQL server ?
Vikram
@Alex, thanks for your advice. We have implemented SQL FTS for a quick turnaround and have something better than SQL queries. However, we are also working on SOLR in the parallel for a long run solution.
Vikram
A: 

One thing to consider is how difficult it will be to setup your application when you mix these two environments (Java/.NET). If you use the Lucene.NET libraries you can limit your required external dependency installs which streamlines deployment.

Another thing to consider is do you need the extras that Solr is offering? A(nother) web admin interface is probably great but it extends your risk envelope. Laying down Java and another service means more patch management. If you stick with .NET only your patch strategy can be the standard windows update model.

Of course rolling your an implementation using Lucene.NET will have development and maintenance costs of its own but in my experience it has been straight forward and easy to work with.

Ira Miller