What's the best approach for using SOLR with web projects?

hey guys

ok, I'm totally new to SOLR and Lucene, but have got Solr running out-of-the-box under Tomcat 6.x and have just gone over some of the basic Wiki entries.

I have a few questions, and require some suggestions too.

Solr can index data in files (XML, CSV) and it can also index DBs. Can you also just point it to a URI/domain, and have it index a website in the way google would?
If I have a website with "Pages" data, so "Page Name", "Page Content" etc, and "Products Data", so "Product Name", "SKU" etc, do I need two different Schema.xml files? and if so, does that mean two different instances of Solr?

Finally, if you have a project with a large relational and normalized database, what would you say is the best approach from the 3 options below?:

Have a middleware service running in the background, which mines the DB and manually creates the relevant XML files to then send to SOLR
Have SOLR index the DB directly. In this case, would it be best to just point SOLR to views, which would abstract all the table relationships?
Any other options I'm unaware of?

Context: We're running in a Windows 2003 environment, .NET 3.5, SQLServer 2005/2008

cheers!

+1 Thanks Mauricio, this is really useful. I wonder if you could just expand a little on one point, possibly two. In terms of stale and fresh data, what data source i use doesn't matter does it? only how often I commit changes...assuming all commits (add/updates/deletes) have to be done manually right? As for SolrNet, do I not need to worry about communication manually with SOLR at all? thanks again

andy 2009-11-10 02:27:05

about data freshness: it depends on the *user* (consumer) of the data. If the consumer needs to *always* see up-to-date data that would rule out offline/background indexing methods and you'd have to go with something more reactive, like triggers or ORM interception.Of course, when indexing webpages you don't get any "triggers", your only option is a crawler.Yes, SolrNet handles .Net <-> Solr communication.

Mauricio Scheffer 2009-11-10 02:55:42

@mauricio: thanks man. We use a custom CMS to build our site. So, would it be an intelligent decision do you think to just commit updates/deletes to Solr via XML whenever Pages/Products are edited in the CMS? Also we don't use NHybernate, so I guess no benefits to SolrNet. thanks again, this is really helpful

andy 2009-11-10 03:01:28

NHibernate integration is only one of the features of SolrNet. Its main purpose is handling all the Solr XML / HTTP communication and provide a .Net interface for all Solr operations.

Mauricio Scheffer 2009-11-10 03:20:31

thanks Mauricio, I think I will use Solrnet, thanks for making it Open Source. Does Solrnet take care of writing the schema for Solr? if so, how? if not, then I have to write the schema myself?cheers!

andy 2009-11-10 22:30:13

Nope, you still have to write the schema and configuration yourself. There are lots of server-only settings and tweaks, use the Solr wiki (http://wiki.apache.org/solr/) or Eric's book (http://www.packtpub.com/solr-1-4-enterprise-search-server/book) as reference.

Mauricio Scheffer 2009-11-11 00:31:23

And by "dead on", I mean very accurate! Just in case anyone was confused!

Eric Pugh 2009-11-10 15:39:44

cool, thanks for the extra advice Eric. I was just wondering if having the middleware was just totally stupid, but i think it makes sense in an environment, where as you say, the data sources are varied. cheers! +1

andy 2009-11-10 21:56:49

ansaurus

tags:

views:

answers:

What's the best approach for using SOLR with web projects?

related questions