views:

141

answers:

1

Hi,

I am adding Apache Lucene support to Querydsl (which offers type-safe queries for Java) and I am having problems understanding how Lucene evaluates queries especially regarding negation in nested queries.

For instance the following two queries in my opinion are semantically the same, but only the first one returns results.

+year:1990 -title:"Jurassic Park"
+year:1990 +(-title:"Jurassic Park")

The simplified object tree in the second example is shown below.

query : Query
  clauses : ArrayList
    [0] : BooleanClause
      "MUST" occur : BooleanClause.Occur
      "year:1990" query : TermQuery
    [1] : BooleanClause
      "MUST" occur : BooleanClause.Occur
      query : BooleanQuery
        clauses : ArrayList
          [0] : BooleanClause
            "MUST_NOT" occur : BooleanClause.Occur
            "title:"Jurassic Park"" query : TermQuery

Lucene's own QueryParser seems to evaluate "AND (NOT" into the same kind of object trees.

Is this a bug in Lucene or have I misunderstood Lucene's query evaluation? I am happy to give more information if necessary.

+2  A: 

They are not semantically the same.

In

+year:1990 +(-title:"Jurassic Park")

You have a subquery that only has one NOT clause. What's happening is that Lucene is evaluating the

-title:"Jurassic Park"

clause and it's returning 0 documents. Then you're indicating that the subquery MUST occur, and since it's return zero documents, it negates the rest of the query.

bajafresh4life
Thanks a bunch, it makes perfect sense now.
ponzao
How does do a NOT only search, when one really needs too..
mP