views:

80

answers:

1

Google currently uses keywords such as site: or is: in searches (the second example is from Gmail). I'm trying to develop a similar system and am wondering how best to go about identifying and handling those terms. For simplicity's sake, assume an OO language is being used (Ruby, Python, Java, C#, et al).

Currently, my plan is to have a separate class for each keyword. These classes have a precedence value and three methods:

  1. isRelevant(String searchPhrase): Returns true if the search phrase matches the class's filter.
  2. getResults(String searchPhrase): Returns a list of results based on the search phrase.
  3. reviseSearch(String searchPhrase): Returns a modified version of the search phrase. This will normally be removing the match to avoid it being processed again by a lower-precedence instance, but might also add text or clear the string entirely.

The calling method will then go through these keyword filters until the search phrase is empty or there are no more filters (in the latter case it will revert to its normal search behavior).

Thus, the question: Is this the most efficient way of doing this, or is there some more appropriate method? Some of the details still need to be figured out, but is this a step in the right direction?

+3  A: 

Basics

sample string:

foo:(hello world) bar:(-{bad things}) email:[email protected] another:weird characters +=2{-52!%#^ final:end

split with regex:

/\s+(?=\w+:)/

return array:

[
  'foo:(hello world)',
  'bar:(-{bad things})',
  'email:[email protected]',
  'another:weird characters +=2{-52!%#^',
  'final:end'
]

regex explanation:

\s+     one or more spaces
(?=     followed by (positive lookahead)
  \w+   one or more word characters
  :     literal `:' (colon character)
)

usage:

Iterate through the array, split each element at : (colon). The left side key could be used to call a function and the right side value could be passed as the function parameter. This should pretty much put you on track for whatever you want to do from here.

Example ruby usage

search.rb

# Search class
class Search

  def initialize(query)
    @query = query
  end

  def foo(input)
    "foo has #{input}"
  end

  def bar(input)
    "bar has #{input}"
  end

  def email(input)
    "email has #{input}"
  end

  def another(input)
    "another has #{input}"
  end

  def final(input)
    "final has #{input}"
  end

  def exec
    @query.split(/\s+(?=\w+:)/).each do |e|
      method, arg = e.split(/:/)
      puts send(method, arg) if respond_to? method
    end
  end

end

use search.rb

q = "foo:(hello world) bar:(-{bad things}) email:[email protected] another:weird characters +=2{-52!%#^ final:end";
s = Search.new(q)
s.exec

output

foo has (hello world)
bar has (-{bad things})
email has [email protected]
another has weird characters +=2{-52!%#^
final has end
macek