views:

110

answers:

1

I am trying to use Lucene to search for names in a database. However, some of the names contain words like "NOT" and "OR" and even "-" minus symbols. I still want the different tokens inside the names to be broken up using an Analyzer and searched upon as a boolean combination of terms, but I do not want Lucene to interpret any of the "NOT"/"OR" terms as operators (instead I want them to be searched upon like normal terms).

One way to accomplish what I am talking about would be to manually run the Analyzer on the search query and then manually construct a boolean query based on all the resulting tokens. Is this the best way? I get the impression that analyzer's were designed to be used in conjunction with the query parser and I feel like there should be a built-in way to accomplish what I am trying to do. Anyone know the best way to do this?

+1  A: 

Your own suggested approach of constructing a BooleanQuery from a TokenStream makes complete sense. The QueryParser API is really just intended for parsing structured queries using a specific syntax - if you are not leveraging the query parser syntax, I see no reason to use QueryParser over a manually constructed BooleanQuery.

However, if you are using a StandardAnalyzer (or another analyzer with a StopFilter) to index your fields, words like "AND", "NOT" and "OR" will not be indexed, and cannot be searched on. So in that case you could just as easily strip those words and operators like "-" and "+" from your queries using a regular expression. I would sooner recommend the BooleanQuery approach, however.

Alex Vigdor