views:

183

answers:

0

I have backed up a bunch of markdown formatted comments into an XML document. This of course meant I needed to HTMLescape them. When I try to use CGI.unescapeHTML it adds a bunch of strange characters into the markup that do not render well in all browsers.

Specifically, it replaces two spaces with "\302\240 ", but not consistently. How do I get it to stop this behavior?

eg:

s = "I am seeing more and more <a href="http://github.com/aslakhellesoy/cucumber /tree/master">Cucumber</a> usage.  This is a good thing!  But I'm also seeing people who are not using regular expressions to their fullest.  Here are some quick regex tips to keep you features readable:

* `(?:a|an)` -- using a this construct you can group things wihout actually matching them.  I'm seeing a lot of steps that have unused params because someone needed a group but didn't know how to avoid capturing it&#x000A"
CGI.unescapeHTML s
# => "I am seeing more and more <a href=\"http://github.com/aslakhellesoy/cucumber/tree/master\"&gt;Cucumber&lt;/a&gt; usage.\302\240 This is a good thing!\302\240 But I'm..."