We need to integrate a search engine in our Product Catalog management software. the catalog is expected to have more than 4-5 mn. records with relational data spread over several tables. Our dev platform is Asp.Net 3.5 and we have done some pre-liminary work on Lucene, found it to be good. However, we just came to know of Solr and was looking for some practical tips to compare Lucene & Solr from implementation, timeline, regular maintenance, performance, features perspective. Any guidance or pointers would be really helpful. Thanks.
Lucene:
Apache Lucene is a high-performance, full-featured text search engine library written entirely in Java. It is a technology suitable for nearly any application that requires full-text search
Solr:
Solr is an open source enterprise search server based on the Lucene Java search library, with XML/HTTP and JSON APIs, hit highlighting, faceted search, caching, replication, a web administration interface and ...
Essentially, Lucene is embedded in Solr and is purely a full-text search library, with the purpose of being embedded into projects giving them full-text search capabilities. Solr has much more features and administration capabilities, allowing to search structured data without needing to write any custom code, load data from CSV files, tolerant parsing of user input, faceted searching, highlighting matched text in results, and retrieving search results in a variety of formats (XML, JSON, ...) . Check Solr features page and see if any feature is relevant for your project.
We are exactly in the same situation as you are. Unfortunately I was not directly involved in the evaluation process, but at the end we're going to use Solr integrated with Lucene.
The main advantage is the variety of formats as dcruz described. So you can query your Solr-Consumer and get back your search result as XML data which can be easily parsed and displayed on the webpage.
I created this quick summary of Solr VS Sphinx a few days ago, based mostly on StackOverflow answers: http://beerpla.net/2009/09/03/comparison-between-solr-and-sphinx-search-servers-solr-vs-sphinx-fight. It should answer at least some of your questions.
P.S. My vote is for Solr.
Like dcruz says, Solr uses Lucene anyway, so it's not a valid comparison.
Lucene is a toolkit for building search apps, Solr is a search app built with Lucene.
IMO you'd be crazy not to use Solr, as it provides you with a lot of 'plumbing' that you'd have to write yourself otherwise -- like a configurable Data Import Handler to suck data out of your RDBMS or XML repositories.
Plus it gives you a web admin interface and other bells and whistles.
I have to agree with Andrew Clegg. I think when a lot of Java Developer types look at Lucene vs Solr, Lucene looks more friendly because it's a just a library (POJJ: Plain Old Java Jar!), like any other library and it looks straightforward to embed, versus the complexity of standing Solr up as a separate process that communicates over complex HTTP.
However, I think that for almost all search use cases, Solr is the right approach. Because most of the complexity in Search is not the direct initial integration, but in the fuzzy areas of tuning searches, scaling to meet demand, and maintaining your indexes that cross over from the developer centric world to being in the systems world. And Solr handles all of those needs nicely.
Let me shift your focus a bit: are you prepared to changes in architecture of you product? Both Lucene and Solr are implemented in Java. So you will end up running yet another web-container for hosting it (and hence will lose platform purity so to say). While Lucene was ported to .NET (Lucene.NET project), Solr was not as far as I know. If you happen to use SQL Server (which is likely, considering you platform), you might consider SQL Server Full-Text Search instead - it has almost the same features (not so feature-rich as Lucene/Solr, but anyway) and usually (in most cases) is much easier to incorporate into existing application. Besides that you benefit from simplified maintenance (it comes together with you database) and staying within single platform as well.
One thing to consider is how difficult it will be to setup your application when you mix these two environments (Java/.NET). If you use the Lucene.NET libraries you can limit your required external dependency installs which streamlines deployment.
Another thing to consider is do you need the extras that Solr is offering? A(nother) web admin interface is probably great but it extends your risk envelope. Laying down Java and another service means more patch management. If you stick with .NET only your patch strategy can be the standard windows update model.
Of course rolling your an implementation using Lucene.NET will have development and maintenance costs of its own but in my experience it has been straight forward and easy to work with.