ansaurus

Question

Answer 1

A:

DuplicateFilter independently constructs a filter which chooses either the first or last occurence of all documents containing each key. This can be cached with minimal memory overhead.

Your second filter independently selects some other documents. The two choices may not coincide. To filter duplicates according to some arbitrary subset of all docs would probably need to use a field cache to be performant and this is where things get expensive RAM-wise

Mark H 2010-09-21 08:22:58

ansaurus

tags:

views:

answers:

Lucene DuplicateFilter question

related questions