ansaurus

Question

'ASCII' to Unicode error in python when attempting to read a latin-1 encoded string

Answer 1

+2 A:

Try this instead:

print(repr(unicode(data, 'iso-8859-1')))

by printing a unicode object you're implicitly trying to convert it to the default encoding, which is ASCII. Using repr will escape it into an ASCII-safe form, plus it'll be easier for you to figure out what's going on for debugging.

Laurence Gonsalves 2010-02-19 06:43:02

thanks, that was helpful!

Simon 2010-02-19 07:07:30

+1, but note that this is helpful now, but not globally. You'd better start learning a bit about encoding if you want to avoid these problems.

e-satis 2010-02-19 08:23:45

I'm not sure I understand your point - the helpful part was understanding that print() would implicitly convert to the default encoding.

Simon 2010-02-19 17:44:45

Answer 2

+1 A:

Are you using Python 3.X or 2.X? It makes a difference. Actually looks like 2.X but you confused me by using print(blahblah) :-)

Answer to your last question: Yes, ASCII by default when you do print(). On 3.X: Use print(ascii(foo)) for debugging, not print(foo). On 2.X use repr(), not ascii().

Your original problem with the no-break space should go away if (a) the data is unicode and (b) you use the re.UNICODE flag with the re.compile()

John Machin 2010-02-19 06:46:02

Yep, 2.6- thanks for the `repr()` technique.Re: the original problem - `re.UNICODE` was the trick - THANKS!

Simon 2010-02-19 07:06:44

ansaurus

tags:

views:

answers:

'ASCII' to Unicode error in python when attempting to read a latin-1 encoded string

related questions