views:

385

answers:

2

I have a list of keywords that I need to search against, using ThinkingSphinx Some of them being more important than others, i need to find a way to weight those words.

So far, the only solution i came up with is to repeat x number of times the same word in my query to increase its relevance. Eg:
3 keywords, each of them having a level of importance: Blue(1) Recent(2) Fun(3) I run this query

MyModel.search "Blue Recent Recent Fun Fun Fun", :match_mode => :any

Not very elegant, and quite limiting. Does anyone have a better idea?

+1  A: 

If you can get those keywords into a separate field, then you could weight those fields to be more important. That's about the only good approach I can think of, though.

MyModel.search "Blue Recent Fun", :field_weights => {"keywords" => 100}
pat
Wouldn't I still have the same problem?in that case, keywords would have more weight than, let's say, the title field. But what I'm in fact trying to do is to make keyword1 more important than keyword2 in the query.
A: 

Recently I've been using Sphinx extensively, and since the death of UltraSphinx, I started using Pat's great plugin (Thanks Pat, I'll buy you a coffee in Melbourne soon!)

I see a possible solution based on your original idea, but you need to make changes to the data at "index time" not "run time".

Try this:

  1. Modify your Sphinx SQL query to replace "Blue" with "Blue Blue Blue Blue", "Recent" with "Recent Recent Recent" and "Fun" with "Fun Fun". This will magnify any occurrences of your special keywords.

    *e.g. SELECT REPLACE(my_text_col,"blue","blue blue blue") as my_text_col ...*

    You probably want to do them all at once, so just nest the replace calls.

    *e.g. SELECT REPLACE(REPLACE(my_text_col,"fun","fun fun"),"blue","blue blue blue") as my_text_col ...*

  2. Next, change your ranking mode to SPH_RANK_WORDCOUNT. This way maximum relevancy is given to the frequency of the keywords.

  3. (Optional) Imagine you have a list of keywords related to your special keywords. For example "pale blue" relates to "blue" and "pleasant" relates to "fun". At run time, rewrite the query text to look for the target word instead. You can store these words easily in a hash, and then loop through it to make the replacements.

# Add trigger words as the key, 
# and the related special keyword as the value
trigger_words = {}
trigger_words['pale blue'] = 'blue'
trigger_words['pleasant'] = 'fun'

# Now loop through each query term and see if it should be replaced
new_query = ""
query.split.each do |word|
  word = trigger_words[word] if trigger_words.has_key?(word)
  new_query = new_query + ' ' word 
end

Now you have quasi-keyword-clustering too. Sphinx is really a fantastic technology, enjoy!

crunchyt