views:

576

answers:

5

There are several advantages to use Solr 1.4 (out-of-the-box facetting search, grouping, replication, http administration vs. luke, ...).

Even if I embed a search-functionality in my Java application I could use SolrJ to avoid the HTTP trade-off when using Solr. Is SolrJ recommended at all?

So, when would you recommend to use "pure-Lucene"? Does it have a better performance or requires less RAM? Is it better unit-testable?

PS: I am aware of this question.

+2  A: 

If you want to completely embed your search functionality within your application and do not want to maintain a separate process like Solr, using Lucene is probably preferable. Per example, a desktop application might need some search functionality (like the Eclipse IDE that uses Lucene for searching its documentation). You probably don't want this kind of application to launch a heavy process like Solr.

Pascal Dimassimo
What do you mean with heavy? In terms of CPU/RAM or the maintaining stuff?
Karussell
In terms of physical resources, yes. And there is the startup time of Solr that would probably be unacceptable in a desktop application.
Pascal Dimassimo
But I have never experiment with the EmbeddedSolrServer. It might a interesting way to embed Solr.
Pascal Dimassimo
+2  A: 

If you have a web application, use Solr - I've tried integrating both, and Solr is easier. Otherwise, if you don't need Solr's features (the one that comes to mind as being most important is faceted search), then use Lucene.

James Kingsbery
Did you use SolrJ or HTTP approach? I tried to embed lucene in a webapp and it was quite easy.
Karussell
I used Solrj, so I didn't need to make HTTP requests from within the application. Honestly, I cannot remember what made it difficult, so maybe I was doing something dumb somewhere.
James Kingsbery
Thanks for the reply. What about unit-testing is it easy to setup a RAMDirectory like I can do with lucene?
Karussell
I haven't tried it, but apparently it is possible: http://search.lucidimagination.com/search/out?u=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FSOLR-465
James Kingsbery
+2  A: 

Here is one situation where I have to use Lucene.

Given a set of documents, find out the most common terms in them.

Here, I need to access term vectors of each document (using low-level APIs of TermVectorMapper). With Lucene it's quite easy.

Another use case is for very specialized ordering of search results. For exmaple, I want a search for an author name (who has writen multiple books) to result into one book from each store in the first 10 results. In this case, I will find results from each book store and to show final results I will pick one result from each book store. Here you are essentially doing multiple searches to generate final results. Having access to low-level APIs of lucene definitely helps.

One more reason to go for Lucene was to get new goodies ASAP. This no longer is true as both of them have been merged and there will be synchronous releases.

Shashikant Kore
Regarding the TermVectorMapper -> Do you know if it is possible with Solr? Regarding the search-order example: couldn't this be done with Solr's grouping feature: http://blog.jteam.nl/2009/10/20/result-grouping-field-collapsing-with-solr/
Karussell
TVMapper is core to Lucene. Why go through extra layer when you can directly read from the source? And I'm not exactly looking for grouping. I want all results from each of the book store, but I want the order to be a close approximation of round-robin with some additional criteria.
Shashikant Kore
+4  A: 

You need to use Lucene,

  • To have more control
  • You cannot depend on any Web server
  • To use termvector, termdocs etc
  • You could easily extend to have your own Analyzer

You need to use Solr,

  • To index and search docs easily by writting few code
  • Solr is a standalone App and it takes care most of the stuff like optimizing,warmup the reader etc..
  • Solr could be extended to multiple nodes
  • To use facet

If you are developing your client in Java and want to use Solr then i would advise to use SolrJ as it is easy and you don't need to care about HTTP stuff. I use Solr using SolrJ in my project www.findbestopensource.com

solidstone
thanks, this helped. already saw this on the mailinglist, but didn't find time yet to answer
Karussell
+1  A: 

I'm surprised nobody mentioned NRT - Near Real Time search, available with Lucene, but not with Solr (yet).

Otis Gospodnetic
really? here is the link http://wiki.apache.org/lucene-java/NearRealtimeSearch ... i thought it is available for solr too
Karussell
@Karussell: see https://issues.apache.org/jira/browse/SOLR-1606
Mauricio Scheffer
Thanks Mauricio!
Karussell