views:

146

answers:

2

I need to have my search indexes based on a Azure/Lucene.NET implementation. That being said, I don't have much knowledge of Solr and Hadoop, or what they offer the Linux crowd.

Since I don't know the learning curve ahead of me, I'll tell you what I'm looking for and perhaps you can tell me how I should spend my time.

I'm interested in indexing an ever-growing batch of emails from our system. As messages are sent or received they need to be searchable. That means the indexes could become huge, and that is why we are looking at cloud storage. Considering that I'm familliar with Azure, managment is sugguesting that we use Lucene.NET.

What do you think is the best way for me to spend my time: Study how to make Lucene.NET index my documents, or look at Solr/Hadoop's implementation for the same.

A: 

Without knowledge of the scale of your source corpus (we operate on several TB in a near real-time application), I can share some of our experiences. We are primarily a .NET shop and we found using Solr quite easy using tools such as SolrNet and a very easy learning curve for our developers.

The advantages of using Solr are plenty: from the obvious ones such as faceting, a simple, flexible API if you need one etc.; to the fact that it has far more active community and has the latest-and-greatest features & fixes (cf. Lucene.net). Importantly, we could easily scale linearly using Solr with commodity machines (Sorry cannot make a $ comparison to using the cloud), but given the (almost zero) cost of the kind of machines we use for our shards, I cannot imagine using Azure or AWS would be cheaper.

Hope that helps.

Mikos
In case someone needs to know what "cf" means (I just looked it up): http://en.wikipedia.org/wiki/Cf.
MakerOfThings7
A: 

If you can communicate with your index machines over HTTP, I would suggest that you use Solr. You can quite easily set up a Solr server without any programming by just changing configuration files. It can scale nicely, see: Scaling Lucene and Solr. Currently in development is Solr Cloud, which will make scaling Solr easier and support some hadoop-like features.

Yuval F