tags:

views:

275

answers:

4

I currently use the following code to sanitize a string before storing them:

ERB::Util::h(string)

My problem occurs when the string has been sanitized already like this:

string = "Watching baseball & football"

The sanitized string will look like:

sanitized_string = "Watching baseball & football"

Can I sanitize by just turning < into &lt; and > into &gt; via substitution?

+4  A: 

Unescape first, then escape again:

require 'cgi'
string = "Watching baseball &amp; football

CGI.escapeHTML(CGI.unescapeHTML(string))

=> "Watching baseball &amp; football"
maxm
Thanks for the help everyone! .. I will try the unescape first answer.
sbtodd
A: 

A fast approach based on this snippet from Erubis.

ESCAPE_TABLE = { '<'=>'&lt;', '>'=>'&gt;' }
def custom_h(value)
   value.to_s.gsub(/[<>]/) { |s| ESCAPE_TABLE[s] }
end
Marcel J.
A: 

Yes you can, or taking it further you can just delete entire tags with a basic regex like this:

mystring.gsub( /<(.|\n)*?>/, '' )
JRL
A: 

You could write your own sanitizer, but there are lots of corner cases and tricky edges in sanitization.

A better approach might be to unencode your string before sanitizing it - does h() have an inverse you could put your strings through first?

DDaviesBrackett