I am using Lucene for Java, and need to figure out what the engine does when I execute some obscure queries. Take the following query:
+(foo -bar)
If I use QueryParser to parse the input, I get a BooleanQuery object that looks like this:
org.apache.lucene.search.BooleanQuery:
org.apache.lucene.search.BooleanClause(required=true, prohibited=false):
org.apache.lucene.search.BooleanQuery:
org.apache.lucene.search.BooleanClause(required=false, prohibited=false):
org.apache.lucene.search.TermQuery: foo
org.apache.lucene.search.BooleanClause(required=false, prohibited=true):
org.apache.lucene.search.TermQuery: bar
What does Lucene look for? Is it documents that MUST contain 'foo' but CANNOT contain 'bar'? What if I search for:
-(foo +bar)
Are those documents that CANNOT contain 'foo' and CANNOT contain 'bar'? Or perhaps ones that CANNOT contain 'foo' but MUST contain 'bar'?
If it helps any, here is what I used to peek into the QueryParser results:
QueryParser parser = new QueryParser("contents", new StandardAnalyzer());
Query query = parser.parse(text);
debug(query, 0);
public static void debug(Object o, int depth) {
for(int i=0; i<depth; i++) System.out.print("\t");
System.out.print(o.getClass().getName());
if(o instanceof BooleanQuery) {
System.out.println(":");
for(BooleanClause clause : ((BooleanQuery)o).getClauses()) {
debug(clause, depth + 1);
}
} else if(o instanceof BooleanClause) {
BooleanClause clause = (BooleanClause)o;
System.out.println("(required=" + clause.isRequired() + ", prohibited=" + clause.isProhibited() + "):");
debug(clause.getQuery(), depth + 1);
} else if(o instanceof TermQuery) {
TermQuery term = (TermQuery)o;
System.out.println(": " + term.getTerm().text());
} else {
throw new IllegalArgumentException("Unknown object type");
}
}