views:

378

answers:

1

I'm trying to produce something similar to what QueryParser in lucene does, but without the parser, i.e. run a string through StandardAnalyzer, tokenize this and use TermQuery:s in a BooleanQuery to produce a query. My problem is that I only get Token:s from StandardAnalyzer, and not Term:s. I can convert a Token to a term by just extracting the string from it with Token.term(), but this is 2.4.x-only and it seems backwards, because I need to add the field a second time. What is the proper way of producing a TermQuery with StandardAnalyzer?

I'm using pylucene, but I guess the answer is the same for Java etc. Here is the code I've come up with:

from lucene import *
def term_match(self, phrase):
    query = BooleanQuery()
    sa = StandardAnalyzer()               
    for token in sa.tokenStream("contents", StringReader(phrase)):
        term_query = TermQuery(Term("contents", token.term())
        query.add(term_query), BooleanClause.Occur.SHOULD)
+1  A: 

The established way to get the token text is with token.termText() - that API's been there forever.

And yes, you'll need to specify a field name to both the Analyzer and the Term; I think that's considered normal. 8-)

RichieHindle
According to the API docs, token.termText() is deprecated, and they point me to instead using something like token.termBuffer()[0:token.termLength()]which works, but seems even more awkward.
Joakim Lundborg