Distributed Lucene.NET

views:

277

answers:

+1 Q:

Distributed Lucene.NET

Hi,

I have a Terabyte of data, maybe more, which I'd like to index and search with Lucene. I'd like to be able to split the index out to different machines, similar to what Solr does (if I understand Solr correctly).

Are there any existing tools to do this on the Windows platform?

Thanks!

Edit: I'm not very keen on running Java Lucene. I will most likely be making my own tweaks to Lucene so I have to stick to Lucene.Net since I don't know much about Java

As far as I know there's no porting of MultiPassIndexSplitter (http://lucene.apache.org/java/3_0_0/api/contrib-misc/org/apache/lucene/index/MultiPassIndexSplitter.html) class to Lucene.net, so probably this feature is not yet implemented.

mamoo 2010-04-16 08:26:23

What you're looking for is Katta. Here's a graph of how it works: Katta

But since you already know Solr, why not just use its sharding capabilities directly?

Mauricio Scheffer 2010-04-16 12:32:48

I will most likely be making my own tweaks to Lucene so I have to stick to Lucene.Net since I don't know much about Java.

2010-04-16 13:31:10

@user72185 ok, then why not just use Solr?

Mauricio Scheffer 2010-04-16 14:13:40

I haven't actually tried Solr, but wouldn't that mean I would have to change Java code if I wanted to make changes to the underlying Lucene?

2010-04-16 14:27:27

what kind of things do you intend to change on Lucene?

Mauricio Scheffer 2010-04-16 16:56:10

1) Search without scoring2) Faster fuzzy search3) Adding some parallelism with Task parallel library4) Custom analyzerI'm sure more will come up.

2010-04-16 18:39:04

1) You can use boosting, sorting or function queries among others to customize the order of your search results. 2) Solr is quite fast as it is, it powers some of the biggest sites on the net. 3) No need to do that on Solr but you can do that client-side if you want/need. 4) Solr has pluggable analyzers (written in Java though)

Mauricio Scheffer 2010-04-16 19:03:49

1) Completely disabling ordering seem to be an order of magnitude faster than the default, when the result set is large (e.g. > 1000000). Unfortunately it's not enough to change the collector (I tried making a NullCollector), it is the Scorer that spends a lot of time traversing every hit.

2010-04-16 20:24:50

2) I don't think they have fuzzy search, at least not using the default implementation.

2010-04-16 20:35:27

2) Solr supports the standard ~ Lucene operator for fuzzy searches

Mauricio Scheffer 2010-04-16 21:15:01

2) What I meant is that if these large sites have huge amounts of data they probably changed the fuzzy implementation. See for instance here: http://www.nearinfinity.com/blogs/aaron_mccurry/what_happens_to_lucene_when.html (strange, link is down right now but usually works).

2010-04-17 06:49:23

Solr is a Java app so it runs on Windows. You can find details on how to configure it as a Windows service here: http://blog.ianbattersby.com/archive/2010/02/09/apache-solr-as-a-windows-service

Pascal Dimassimo 2010-04-16 12:35:28

ansaurus

tags:

views:

answers:

Distributed Lucene.NET

related questions