views:

167

answers:

1

I'm implementing full text search functionality on my rap website, and I'm running into some issues with rapper and song names.

For example, someone might want to search for the rapper "Cam'ron" using the query "camron" (leaving out the mid-word apostrophe). Likewise, someone might search for the song "3 Peat" using the query "3peat".

"The Notorious B.I.G." is a bit of a weird case: "The Notorious BIG" and "The Notorious B.I.G." both work (I guess because the solr.StandardFilterFactory removes dots from acronyms?), but "The Notorious B.I.G" (i.e., minus the trailing dot) doesn't.

Ideally all reasonable variations of these names should work. I'm guessing the answer has something to do with the solr.WordDelimiterFilterFactory, but I'm not sure.

Also, I'm using Sunspot with Rails if that's relevant.

+4  A: 

Yes, you are right. You need to configure WordDelimiterFilterFactory properly. Try to enable all properties and don't forget to enable preserveOriginal property, which will save your original terms also.

generateWordparts - will make from B.I.G. terms - B I G

generateNumberParts - will make from 3Peat terms - 3 Peat

catenateWords - will make from B.I.G. terms - BIG

catenateNumbers - will make from Rapper 802.11 terms - Rapper 80211

catenateAll - will make from Rapper-802.11 term - Rapper80211

splitOnCaseChange - will make from GanGsTa terms - Gan Gs Ta

preserveOriginal - will save also original term. From Rapper-802.11RuuLlZ will make - Rapper-802.11RuuLlZ.

Yurish
Great advice, thanks. I added the following to `schema.xml`: `<filter class="solr.WordDelimiterFilterFactory" stemEnglishPossessive="1" splitOnNumerics="1" splitOnCaseChange="1" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="1" preserveOriginal="1"/>`. This seems to solve everything except the "B.I.G" case. Any ideas?
Horace Loeb
It can be because of StandardTokenizer. I would replace it with WhitespaceTokenizer instead. In order to analyse, how your analyzers are working, you can use "Analysis" in your solr, if you have web-interface for it. There you can see, which analyzer is transforming your text, and what is the result of it`s work.
Yurish