views:

50

answers:

2

Whilst building some unit tests for my Lucene queries I noticed some strange behavior related to punctuation, in particular around parentheses.

What are some of the best ways to deal with search fields that contain significant amounts of punctuation?

A: 

It is not not just parentheses, other punctuations such as the colon, hyphen etc. will cause issues. Here is a way to deal with them.

Mikos
+1  A: 

If you haven't customized the query parser, Lucene should behave according to the default query parser syntax. Are you getting something different than that? Do you want punctuation to have a special meaning or just to remove the punctuation from searches? The other usual suspect here is the Analyzer, which determines how your field is indexed and how the query is broken into pieces for searching. Can you post specific examples of bad behavior?

Yuval F
Thanks for your response. I have moved forward with this by having a 'clean' field on my document that is purely for the purposes of searching. This forces me to also 'clean' all search query strings. Seems to work well and I return the full field as the result from the query.
berko