views:

1934

answers:

4

I've had an app doing prefix searches for a while. Recently the index size was increased and it turned out that some prefixes were too darned numerous for lucene to handle. It kept throwing me a Too Many Clauses error, which was very frustrating as I kept looking at my JARs and confirming that none of the included code actually used a boolean query.

Why doesn't it throw something like a Too Many Hits exception? And why does increasing the boolean query's static max clauses integer actually make this error go away, when I'm definitely only using a prefix query? Is there something fundamental to how queries are run that I'm not understanding; is it that they secretly become Boolean queries?

+4  A: 

I've hit this before. It has to do with the fact that lucene, under the covers, turns many (all?) things into boolean queries when you call Query.rewrite()

From: http://lucene.apache.org/java/2_2_0/api/org/apache/lucene/search/Query.html#rewrite(org.apache.lucene.index.IndexReader)

public Query rewrite(IndexReader reader)
              throws IOException

    Expert: called to re-write queries into primitive queries.
            For example, a PrefixQuery will be rewritten into a
            BooleanQuery that consists of TermQuerys.

    Throws:
        IOException
Ryan Ahearn
A: 

Since I wasn't explicitly calling rewrite on the query, I had no idea this was what happens to a prefix query. But that seems to explain it.

dlamblin
A: 

When running a prefix query, Lucene searches for all terms in its "dictionary" that match the query. If more than 1024 (by default) match, the TooManyClauses-Exception is thrown.

You can call BooleanQuery.setMaxClauseCount to increase the maximum number of clauses permitted per BooleanQuery.

Stefan Schultze
That makes sense, but the issue for me was that I had no way of knowing a PrefixQuery actually became a BooleanQuery.
dlamblin
+1  A: 

The API reference page of TooManyClauses shows that PrefixQuery, FuzzyQuery, WildcardQuery, and RangeQuery are expanded this way (into BooleanQuery). Since it is in the API reference, it should be a behavior that users can rely on. Lucene does not place arbitrary limits on the number of hits (other than a document ID being an int) so a "too many hits" exception might not make sense. Perhaps PrefixQuery.rewrite(IndexReader) should catch the TooManyClauses and throw a "too many prefixes" exception, but right now it does not behave that way.

By the way, another way to search by prefix is to use PrefixFilter. Either filter your query with it or wrap the filter with a ConstantScoreQuery.

Kai Chan