ansaurus

Question

Answer 1

+4 A:

I try to e.g. convert that stuff to ISO-8859-1

Close. You actually want to do it the other way around: you've got ISO-8859-1(*), you want UTF-8(**). So str.encode('utf-8', 'iso-8859-1') would be more likely to do the trick.

*: actually you might well have Windows code page 1252, which is like ISO-8859-1, but with extra smart-quotes and things in the range 0x80-0x9F which ISO-8859-1 uses for control codes. If so, use 'cp1252' instead.

**: well, you probably do. Working with UTF-8 is the best way forward so you can store all possible characters. If you really want to keep working in ISO-8859-1/cp1252, then presumably the problem is just that Ruby has mis-guessed the character set in use and you can fix it by calling str.force_encoding('iso-8859-1').

bobince 2009-12-12 11:23:26

Thanks! I always mix the encoding stuff up :( This probably was iso-8859-1 but somehow along the way, it got declared UTF8. This helped: post.body_html.force_encoding('iso-8859-1').encode("utf-8")

Marc Seeger 2009-12-12 11:35:05

Cool! Yep, that would do the same thing.

bobince 2009-12-12 11:39:24

ansaurus

tags:

views:

answers:

clean up strange encoding in ruby

related questions