Lucene.Net memory consumption and slow search when too many clauses used

views:

answers:

Lucene.Net memory consumption and slow search when too many clauses used

I have a DB having text file attributes and text file primary key IDs and indexed around 1 million text files along with their IDs (primary keys in DB).

Now, I am searching at two levels. First is straight forward DB search, where i get primary keys as result (roughly 2 or 3 million IDs)

Then i make a Boolean query for instance as following

+Text:"test*" +(pkID:1 pkID:4 pkID:100 pkID:115 pkID:1041 .... )

and search it in my Index file.

The problem is that such query (having 2 million clauses) takes toooooo much time to give result and consumes reallly too much memory....

Is there any optimization solution for this problem ?

+1 A:

Assuming you can reuse the dbid part of your queries:

Split the query into two parts: one part (the text query) will become the query and the other part (the pkID query) will become the filter
Make both parts into queries
Convert the pkid query to a filter (by using QueryWrapperFilter)
Convert the filter into a cached filter (using CachingWrapperFilter)
Hang onto the filter, perhaps via some kind of dictionary
Next time you do a search, use the overload that allows you to use a query and filter

As long as the pkid search can be reused, you should quite a large improvement. As long as you don't optimise your index, the effect of caching should even work through commit points (I understand the bit sets are calculated on a per-segment basis).

HTH

p.s.

I think it would be remiss of me not to note that I think you're putting your index through all sorts of abuse by using it like this!

Moleski 2010-06-18 14:55:32

sorry for late reply but you are quite right. Now i have moved all the DB records to my Lucene file (and made a big flat table just like DB) and I don't have to use millions of IDs as input.

Umer 2010-10-26 10:23:02

+1 A:

The best optimization is NOT to use the query with 2 million clauses. Any Lucene query with 2 million clauses will run slowly no matter how you optimize it.

In your particular case, I think it will be much more practical to search your index first with +Text:"test*" query and then limit the results by running a DB query on Lucene hits.

buru 2010-07-09 11:12:36

thanks, that was quite definite answer but unfortunately i can't go for DB after getting result from Lucene.

Umer 2010-10-26 10:21:02

ansaurus

tags:

views:

answers:

Lucene.Net memory consumption and slow search when too many clauses used

related questions