tags:

views:

410

answers:

2

I tried this Lucene code example, which worked:
http://snippets.dzone.com/posts/show/8965

However changing:
Query query = parser.parse("st.");
to
Query query = parser.parse("t");

returned zero hits.

How to write a Lucene query that returns all words containing the letter "t" ?
(max nbr of hits to return = 20)

+1  A: 

I have good news and bad news. The good news is that you can use wildcards to match any text:

parser.parse("st*"); // Will math "st.", "station", "steal", etc...

Unfortunately, the documentation indicates:

Note: You cannot use a * or ? symbol as the first character of a search.

Meaning, you cannot use this syntax:

parser.parse("*t*");

Therefore, you cannot ask Lucene to return terms that contain the letter 't' at an arbitrary location. You can ask Lucene to return terms that begin with a certain letter.

You're only option at this point appears to be iterating through all terms, doing you're own matching.

Adam Paynter
Actually, you can use leading wildcards, so the query "*t*" is possible. You just need to enable them. From the Lucene FAQ (http://wiki.apache.org/lucene-java/LuceneFAQ):"Leading wildcards (e.g. *ook) are not supported by the QueryParser by default. As of Lucene 2.1, they can be enabled by calling QueryParser.setAllowLeadingWildcard( true ). Note that this can be an expensive operation: it requires scanning the list of tokens in the index in its entirety to look for those that match the pattern."
Kai Chan
@Kai: Neat! Wow, I really have been away from the Lucene library for a while!
Adam Paynter
+2  A: 

You need a different Analyzer. The example uses StandardAnalyzer, which removes punctuation and breaks words according to white space and some other more elaborate rules. It does not, however, break words into characters. You will probably need to build your own custom analyzer to do this, and it seems it will be costly in both run time and memory consumption. Another (probably better) option is to use a RegexQuery.

Yuval F
This worked: <code><pre>RegexQuery regexquery = new RegexQuery(new Term("fieldname", ".*t.*"));isearcher.search(regexquery, collector);System.out.println("collector.getTotalHits()=" + collector.getTotalHits());</pre></code>
Wow, I had never heard of the `RegexQuery`. When did it get added to the library? I admit, I haven't worked with Lucene for a few years now.
Adam Paynter
I also heard about it incidentally, by reading a colleague's code. From looking at the subversion logs for RegexQuery (one of the joys of open source) it has been in Lucene at least since December 28th, 2005. However, this is part of contrib, and not (yet?) one of Lucene's core queries.
Yuval F
@Yuval: Ahhh, that makes sense.
Adam Paynter
@hjo1620: I see that you were trying to format some code in your comment. All you have to do is surround your code with back ticks (it's just to the left of the '1' key on your keyboard).
Adam Paynter