ansaurus

Question

Faster way to get distinct values from Lucene Query

Answer 1

A:

I'm not sure there is, honestly, as Lucene doesn't provide 'distinct' functionality. I believe with SOLR you can use a facet search to achieve this, but if you want this in Lucene, you'd have to write some sort of facet functionality yourself. So as long as you don't run into any performance issues, you should be ok this way.

Razzie 2009-03-06 10:18:24

Ok, thanks for letting me know.

borisCallens 2009-03-06 10:23:27

Answer 2

+2 A:

Tying this question to an earlier question of yours (re: "Too many clauses"), I think you should definitely be looking at term enumeration from the index reader. Cache the results (I used a sorted dictionary keyed on the field name, with a list of terms as the data, to a max of 100 terms per field) until the index reader becomes invalid and away you go.

Or perhaps I should say, that when faced with a similar problem to yours, that's what I did.

Hope this helps,

Moleski 2009-03-06 10:50:24

Could you elaborate on what you mean with "Term Enumeration"? Do you mean enumerating all my documents and getting those fields so I can use C#'s StartsWith()?

borisCallens 2009-03-06 12:42:03

+1 for seeing the question behind the question

borisCallens 2009-03-06 12:43:03

Have a look at the Terms member function of the IndexReader class. BTW, I found out a good deal about this kind of thing by having a look at the Luke source code. Very interesting!

Moleski 2009-03-06 17:24:23

I'm not a big fan of Luke actually. I don't know why, but it takes ages for each query to parse. Way slower then my own queries.

borisCallens 2009-03-19 15:10:16

ansaurus

tags:

views:

answers:

Faster way to get distinct values from Lucene Query

related questions