views:

165

answers:

3

I'm trying to return results more like the search

My curren algorithm is this

def search_conditions(column, q)
  vars  = []
  vars2 = []

  vars << q

  if q.size > 3
    (q.size-2).times do |i|
      vars2 << q[i..(i+2)]
      next if i == 0
      vars << q[i..-1]
      vars << q[0..(q.size-1-i)]
      vars << q[i % 2 == 0 ? (i/2)..(q.size-(i/2)) : (i/2)..(q.size-1-(i/2))] if i > 1
    end
  end

  query = "#{column} ILIKE ?"
  vars = (vars+vars2).uniq

  return [vars.map { query }.join(' OR ')] + vars.map { |x| "%#{x}%" }
end

If I search for "Ruby on Rails" it will make 4 search ways.

1) Removing the left letters "uby on Rails".."ils"

2) Removing the right letters "Ruby on Rail".."Rub"

3) Removing left and right letters "uby on Rails", "uby on Rail" ... "on "

4) Using only 3 letters "Rub", "uby", "by ", "y o", " on" ... "ils"

Is good to use these 4 ways? There any more?

+3  A: 

Why are you removing these letters? Are you trying to make sure that if someone searches for 'widgets', you will also match 'widget'?

If so, what you are trying to do is called 'stemming', and it is really much more complicated than removing leading and trailing letters. You may also be interested in removing 'stop words' from your query. These are those extremely common words that are necessary to form grammatically-correct sentences, but are not very useful for search, such as 'a', 'the', etc.

Getting search right is an immensely complex and difficult problem. I would suggest that you don't try to solve it yourself, and instead focus on the core purpose of your site. Perhaps you can leverage the search functionality from the Lucene project in your code. This link may also be helpful for using Lucene in Ruby on Rails.

I hope that helps; I realize that I sort of side-stepped your original question, but I really would not recommend trying to tackle this yourself.

pkaeding
This is what we call good advice.
jonnii
solr looks really good, I think I'll use it. :)
ShadowBelmolve
Cool, I'm glad it helped!
pkaeding
+2  A: 

As pkaeding says, stemming is far too complicated to try to implement yourself. However, if you want to search for similar (not exact) strings in MySQL, and your user search terms are very close to the full value of a database field (ie, you're not searching a large body of text for a word or phrase), you might want to try using the Levenshtein distance. Here is a MySQL implementation.

The Levenshtein algorithm will allow you to do "fuzzy" matching, give you a similarity score, and help you avoid installation and configuration of a search daemon, which is complicated. However, this is really only for a very specific case, not a general site search.

Alex Reisner
That is a good idea, if it fits for your purposes. I will be bookmarking that link :)
pkaeding
This algorithm seems interesting, but I think it would not work for my case. Anyway, bookmarked.
ShadowBelmolve
+1  A: 

While, were all suggesting other possible solutions, check out:

Sphinx - How do you implement full-text search for that 10+ million row table, keep up with the load, and stay relevant? Sphinx is good at those kinds of riddles.

Thinking Sphinx - A Ruby connector between Sphinx and ActiveRecord.

Evolve
but sphinx will search for similar words?
ShadowBelmolve