views:

237

answers:

1

Hello, I am indexing about 100M documents that consist of a few string identifiers and a hundred or so numaric terms.. I won't be doing range queries, so I haven't dugg too deep into Numaric Field but I'm not thinking its the right choose here.

My problem is that the query performance degrades quickly when I start adding OR criteria to my query.. All my queries are on specific numaric terms.. So a document looks like StringField:[someString] and N DataField:[someNumber].. I then query it with something like DataField:((+1 +(2 3)) (+75 +(3 5 52)) (+99 +88 +(102 155 199))).

Currently these queries take about 7 to 16 seconds to run on my laptop.. I would like to make sure thats really the best they can do.. I am open to suggestions on field structure and query structure :-).

Thanks

Josh

PS: I have already read over all the other lucene performance discussions on here, and on the Lucene wiki and at lucid imiagination... I'm a bit further down the rabbit hole then that...

A: 

Since you have mentioned that you are doing specific number queries and not range queries, I will not suggest you to take a look at really-fast numeric range queries in Lucene 3.0.

Going by your description, I suppose, scoring is causing the problem. When you have so many nested boolean queries, scoring keeps on getting complex. And scores being floating point numbers, arithmetic is slower. If you don't care about scores, writing custom Collector is a good idea. You can see the example, in javadoc I have linked, to write your own.

Shashikant Kore
I am using Lucene.net 2.9.2 and the numaricQuery isn't in there yet. (if it was I bet this WOULD be much faster :-) )Also Scoring isn't involved here.. I am already using my own collectors.. The numbers above are based on my "GetCount" collector, and all it does is increment an int when Collect(int doc) is called. Can't get any faster on the collector side then that :-P.
Josh Handel
So a correction (after re-reading your comment).. Numaric range queries are in 2.9.2 (that I knew).. But after looking at 3.0 for Numaric queries (I was considering porting it myself), I realized what you ment about numaric range querys.. so ya.. Wish numaric queries existed.. but really they are just term queries anyways, which is what my current query's are getting turned into..
Josh Handel