There are two pieces to get this to work:
- You need the incoming documents to be analysed properly, so that individual words are tokenised and indexed separately
- The user query needs to be tokenised, and the tokens combined with the
AND
operator.
For #1, there are a number of Analyzers and Tokenizers that come with Lucene - have a look in the org.apache.lucene.analysis
package. There are options for many different languages, stemming, stopwords and so on.
For #2, there are again a lot of query parsers that come with Lucene, mainly in the org.apache.lucene.queryParser
packagage. MultiFieldQueryParser
might be good for you: to require every term to be present, just call
QueryParser.setDefaultOperator(QueryParser.AND_OPERATOR)
Lucene in Action, although a few versions old, is still accurate and extremely useful for more information on analysis and query parsing.