tags:

views:

129

answers:

3

I am trying to parse a text file that has the weird quotes like “ and ” into "normal quotes like "

I tried this:

text.gsub!("“",'"')
text.gsub!("”",'"')

but when it's done, they are still there and show up as

\x93 and \x94

so I tried adding that too with no luck:

text.gsub!('\\x93', '"')
text.gsub!('\\x94', '"')

The problem is, when I try to show those weird quotes on a webpage, it makes that weird diamond with a question mark symbol: �

+1  A: 

It seems to work:

text = "“foo”"
=> "\342\200\234foo\342\200\235"
irb(main):002:0> text.gsub!("“",'"')
=> "\"foo\342\200\235"
irb(main):003:0> text.gsub!("”",'"')
=> "\"foo\""

You need to use a hex editor to figure out all the character codes involved.

Matthew Flaschen
A: 

Re: the second question of why the weird quotes show on a web page as the � symbol:

Your problem is that your web page is not in UTF-8 mode. To get it there, see http://www.w3.org/International/O-HTTP-charset

If you can't change your web server, add a meta line in the head section of your web pages: http://www.utf-8.com/

Larry

Larry K
A: 

Your first gsubs should work. The reason the second set of gsubs don't work is that you're using single quotes and double backslash. Try the other way around:

text.gsub!("\x93", '"')
text.gsub!("\x94", '"')

You can also do this in one line:

text.gsub!("\x93", '"').gsub!("\x94", '"')
# or
text.gsub!(/(\x93|\x94)/, '"')

Are you sure the encoding of the string is correct?

Lars Haugseth