tags:

views:

783

answers:

4

I am currently attempting to use Lucene to search data populated in an index.

I can match on exact phrases by enclosing it in brackets (i.e. "Processing Documents"), but cannot get Lucene to find that phrase by doing any sort of "Processing Document*".

The obvious difference being the wildcard at the end.

I am currently attempting to use Luke to view and search the index. (it drops the asterisk at the end of the phrase when parsing)

Adding the quotes around the data seems to be the main culprit as searching for document* will work, but "document*" does not

Any assistance would be greatly appreciated

A: 

It seems that the default QueryParser cannot handle this. You can probably create a custom QueryParser for wildcards in phrases. If your example is representative, stemming may solve your problem. Please read the documentation for PorterStemFilter to see whether it fits.

Yuval F
A: 

Not only does the QueryParser not support wildcards in phrases, PhraseQuery itself only supports Terms. MultiPhraseQuery comes closer, but as its summary says, you still need to enumerate the IndexReader.terms yourself to match the wildcard.

Coady
+1  A: 

Lucene 2.9 has ComplexPhraseQueryParser which can handle wildcards in phrases.

Shashikant Kore
A: 

What you're looking for is FuzzyQuery which allows one to search for results with similar words based on Levenshtein distance. Alternatively you may also want to consider using slop of PhraseQuery (also available in MultiPhraseQuery) if the order of words isn't significant.

Esko