views:

145

answers:

3

I have a Lucene index that contains documents that have a "type" field, this field can be one of three values "article", "forum" or "blog". I want the user to be able to search within these types (there is a checkbox for each document type)

How do I create a Lucene query dependent on which types the user has selected?

A couple of prerequisites are:

  • If the user doesn't select one of the types, I want no results from that type.
  • The ordering of the results should not be effected by restricting the type field.

For reference if I were to write this in SQL (for a "blog or forum search") I'd write:

SELECT * FROM Docs
WHERE [type] in ('blog', 'forum')
+2  A: 

Add a constraints to reject documents that weren't selected. For example, if only "article" was checked, the constraint would be

-(type:forum type:blog)
erickson
This is what I did in the end, although I used the API rather than creating it as a string, see my answer if you're interested.
thatismatt
A: 

While erickson's suggestion seems fine, you could use a positive constraint ANDed with your search term, such as text:foo AND type:article for the case only "article" was checked, or text:foo AND (type:article OR type:forum) for the case both "article" and "forum" were checked.

Yuval F
Intriguingly the two queries "text:foo AND (type:article OR type:forum)" and "text:foo AND -type:blog" do not give the same results, the first query returns the blogs first, where as the second query maintains the ordering (i.e. blogs and articles are mixed). Any idea why?
thatismatt
Lucene doesn't have an "AND" operator. It has + (require) and - (prohibit) operators.
erickson
@erickson: I beg to differ: e.g. http://incubator.apache.org/lucene.net/docs/2.1/Lucene.Net.QueryParsers.QueryParser.AND_OPERATOR.html
Yuval F
Hey, where'd that come from?
erickson
+3  A: 

For reference, should anyone else come across this problem, here is my solution:

IList<string> ALL_TYPES = new[] { "article", "blog", "forum" };
string q = ...; // The user's search string
IList<string> includeTypes = ...; // List of types to include
Query searchQuery = parser.Parse(q);
Query parentQuery = new BooleanQuery();
parentQuery.Add(searchQuery, BooleanClause.Occur.SHOULD);
// Invert the logic, exclude the other types
foreach (var type in ALL_TYPES.Except(includeTypes))
{
    query.Add(
        new TermQuery(new Term("type", type)),
        BooleanClause.Occur.MUST_NOT
    );
}
searchQuery = parentQuery;

I inverted the logic (i.e. excluded the types the user had not selected), because if you don't the ordering of the results is lost. I'm not sure why though...! It is a shame as it makes the code less clear / maintainable, but at least it works!

thatismatt