The wildcard * can only be used at the end of a word, like user*
.
I want to query with a like %user%
, how to do that?
The wildcard * can only be used at the end of a word, like user*
.
I want to query with a like %user%
, how to do that?
Somewhat similar issue : http://stackoverflow.com/questions/468279/lucene-net-leading-wildcard-character-throws-an-error
Lucene provides the ReverseStringFilter that allows to do leading wildcard search like *user. It works by indexing all terms in reverse order.
But I think there is no way to do something similar to 'LIKE %user%'.
When you think about it, it is not entirely unsurprising that lucene's support for wildcarding is (normally) restricted to a wildcard at the end of a word pattern.
Keyword search engines works by creating a reverse index of all words in the corpus, which is sorted in word order. When you do a normal non-wildcard search, the engine makes use of the fact that index entries are sorted to locate the entry or entries for your word in O(logN)
steps where N
is the number of words or entries. For a word pattern with a suffix wildcard, the same thing happens to find the first matching word, and other matches are found by scanning the entries until the fixed part of the pattern no longer matches.
However, for a word pattern with a wildcard prefix and a wildcard suffix, the engine would have to look at all entries in the index. This would be O(N)
... unless the engine built a whole stack of secondary indexes for matching literal substrings of words. (And that would make indexing a whole lot more expensive). And for more complex patterns (e.g. regexes) the problem would be even worse for the search engine.
The trouble with LIKE queries is that they are expensive in terms of time taken to execute. You can set up QueryParser to allow leading wildcards with the following:
QueryParser.setAllowLeadingWildcard(true)
And this will allow you to do searches like:
*user*
But this will take a long time to execute. Sometimes when people say they want a LIKE query, what they actually want is a fuzzy query. This would allow you to do the following search:
user~
Which would match the terms users
and fuser
. You can specify an edit distance between the term in your query and the terms you want matched using a float value between 0 and 1. For example user~0.8
would match more terms than user~0.5
.
I suggest you also take a look at regex query, which supports regular expression syntax for Lucene searches. It may be closer to what you really need. Perhaps something like:
.*user.*