views:

38

answers:

2

I am having a problem with striping punctuation from the solr index When the punctuation sign follow right after a word then this word is not indexed properly.

For example: if we index "hello, John", the asset won't be found by keyword "hello" while there will be no issue if we remove comma after word "hello".

Is there any FilterFactory that suppose to strip punctuation? Any ideas?

Thanks, Bogdan.

A: 

This is done with the WordDelimiterFilterFactory. Set generateWordParts=1.

There is also the PatternTokenizerFactory that could be used, but I have never tried it.

Pascal Dimassimo
A: 

You can use the solr.PatternReplaceFilterFactory to strip beginning and trailing punctuation with this:

<filter class="solr.PatternReplaceFilterFactory"
    pattern="^(\p{Punct}*)(.*?)(\p{Punct}*)$"
    replacement="$2"/>
claytron