views:

75

answers:

2

I am calling Lucene using the following code (PyLucene, to be precise):

analyzer = StandardAnalyzer(Version.LUCENE_30)
queryparser = QueryParser(Version.LUCENE_30, "text", analyzer)
query = queryparser.parse(queryparser.escape(querytext))

But consider if this is the content of querytext:

querytext = "THE FOOD WAS HONESTLY NOT WORTH THE PRICE. MUCH TOO PRICY WOULD NOT GO BACK AND OR RECOMMEND IT"

In that case, the "AND OR" trips up the queryparser, even though I am use queryparser.escape. How do I avoid the following error message?

    Java stacktrace:
org.apache.lucene.queryParser.ParseException: Cannot parse 'THE FOOD WAS HONESTLY NOT WORTH THE PRICE. MUCH TOO PRICY WOULD NOT GO BACK AND OR RECOMMEND IT': Encountered " <OR> "OR "" at line 1, column 80.
Was expecting one of:
    <NOT> ...
    "+" ...
    "-" ...
    "(" ...
    "*" ...
    <QUOTED> ...
    <TERM> ...
    <PREFIXTERM> ...
    <WILDTERM> ...
    "[" ...
    "{" ...
    <NUMBER> ...
    <TERM> ...
    "*" ...

 at org.apache.lucene.queryParser.QueryParser.parse(QueryParser.java:187)
     ....
 at org.apache.lucene.queryParser.QueryParser.generateParseException(QueryParser.java:1759)
 at org.apache.lucene.queryParser.QueryParser.jj_consume_token(QueryParser.java:1641)
 at org.apache.lucene.queryParser.QueryParser.Clause(QueryParser.java:1268)
 at org.apache.lucene.queryParser.QueryParser.Query(QueryParser.java:1207)
 at org.apache.lucene.queryParser.QueryParser.TopLevelQuery(QueryParser.java:1167)
 at org.apache.lucene.queryParser.QueryParser.parse(QueryParser.java:182)
A: 

It's not just OR, it's AND OR.

I use the following workaround:

query = queryparser.parse(queryparser.escape(querytext.replace("AND OR", "AND or")))
Joseph Turian
+1  A: 

queryparser.parse only escapes special characters (as shown in this page) and leaves "AND OR" unchanged, so it would not work in your case. Since presumably you also used StandardAnalyzer to analyze your text, the terms in your index are already in lowercase. So you can change the whole query string to lowercase before giving it to the queryparser. Lowercase "and" and "or" are not considered operators, so "and or" would not trip the queryparser.

Kai Chan