views:

196

answers:

2

I want to write a function that allows users to match data based on a regexp, but I am concerned about sanitation of the user strings. I know with SQL queries you can use bind variables to avoid SQL injection attacks, but I am not sure if there's such a mechanism for regexps. I see that there's Regexp.escape, but I want to allow valid regexps.

Here is is the sample function:

  def tagged?(text)
    tags.each do |tag|
      return true if text =~ /#{tag.name}/i
    end
    return false
  end

Since I am just matching directly on tag.name is there a chance that someone could insert a Proc call or something to break out of the regexp and cause havoc?

Any advice on best practice would be appreciated.

+1  A: 

You should probably create an instance of the Regexp class instead.

def tagged?(text)
  return tags.any? { |tag| text =~ Regexp.new(tag.name, Regexp::IGNORECASE) }
end
Ciarán Walsh
+3  A: 

Interpolated strings in a Regexp are not executed, but do generate annoying warnings:

/#{exit -3}/.match('test')
# => exits

foo = '#{exit -3}'
/#{foo}/.match('test')
# => warning: regexp has invalid interval
# => warning: regexp has `}' without escape

The two warnings seem to pertain to the opening #{ and the closing } respectively, and are independent.

As a strategy that's more efficient, you might want to sanitize the list of tags into a combined regexp you can run once. It is generally far less efficient to construct and test against N regular expressions than 1 with N parts.

Perhaps something along the lines of this:

class Taggable
  def tags
    @tags
  end

  def tags=(value)
    @tags = value

    @tag_regexp = Regexp.new(
      [
        '^(?:',
        @tags.collect do |tag|
          '(?:' + tag.sub(/\#\{/, '\\#\\{').sub(/([^\\])\}/, '\1\\}') + ')'
        end.join('|'),
        ')$'
      ].to_s,
      Regexp::IGNORECASE
    )
  end

  def tagged?(text)
    !!text.match(@tag_regexp)
  end
end

This can be used like this:

e = Taggable.new
e.tags = %w[ #{exit-3} .*\.gif .*\.png .*\.jpe?g ]

puts e.tagged?('foo.gif').inspect

If the exit call was executed, the program would halt there, but it just interprets that as a literal string. To avoid warnings it is escaped with backslashes.

tadman
I was testing with trying to insert a proc and got the same errors. I looks like it's being escaped when assigned to an activerecord object:#<Tag id: 27, user_id: 1, name: "\#{eval 'Proc.new { puts \"yeah\"}.call'}", created_at: "2009-12-31 17:24:54", updated_at: "2009-12-31 17:24:54">And I get the same errors you got when actually trying to use it as part of a regex. I was just not sure if there was something I was missing.
Dan McNevin