tags:

views:

189

answers:

2

I want to search for threads in my mysql database with Solr.

But i want it to not just search the thread words, but for similar words.

Eg. if a thread title is "dog for sale" and if the user searches for dogs the title will be in the result.

and also if a user searches for "mac os x" the word "snow leopard" will appear.

and the ability to link words the application thinks is related eg. house and apartment.

how is this kind of logic done?

i know that you can with solr look up words in a dictionary file you create/add, so solr will look for dogs and see what related words there are (eg. dog).

but where do you find such a dictionary?

i have no idea about this kind of implementation.

please point me into right direction.

thanks

+2  A: 

I think you'll have to build such a dictionary yourself, since it's very application-specific. "House" and "Apartment" might be similar terms for your application but very distant in another application.

Once you have this dictionary you can use it through the SynonymFilterFactory.

Matching "dog" when the user searches for "dogs" is managed by the stemmer and doesn't require any dictionary.

Mauricio Scheffer
but what about other languages eg. swedish? there the plural form is not just adding a "s" but adding: "ar", "or", "er" and sometime nothing. how could solr know this? the stemmer is just for english?
never_had_a_name
@fayer: "Solr includes support for stemming Swedish via solr.SnowballPorterFilterFactory, and Lucene includes an example stopword list." http://wiki.apache.org/solr/LanguageAnalysis#Swedish
Mauricio Scheffer
+1  A: 

You could use the synonym.txt file and create your own dictionary.

Another option for you could be fuzzy search.

Karussell
I don't think you can find "Apartment" with a fuzzy search for "House"
Mauricio Scheffer
Thats true. But its initial question is: "I want to search for threads in my mysql database with Solr. But i want it to not just search the thread words, but for similar words."And synonyms are one option ...
Karussell