views:

122

answers:

1

These have been plaguing me endlessly. Why? It seems that my console can't handle the encoding. I take it that the my browser and word processor can handle it. I don't have a master list of all the possible characters that it's choking on. What is the best way to relieve this without modifying my data?

'charmap' codec can't encode character u'\xca'
+2  A: 

You need to find out the encoding of your console (which system, OS, etc...?) -- 'charmap' is unfortunately a somewhat-ambiguous identification for a codec, as the docs explain:

There’s another group of encodings (the so called charmap encodings) that choose a different subset of all unicode code points and how these codepoints are mapped to the bytes 0x0-0xff. To see how this is done simply open e.g. encodings/cp1252.py (which is an encoding that is used primarily on Windows). There’s a string constant with 256 characters that shows you which character is mapped to which byte value.

All of these encodings can only encode 256 of the 65536 (or 1114111) codepoints defined in unicode.

i.e., it identifies a set of possible codecs, not a specific one.

Once you know your console supports a codec named 'foobar', change your statements that are now

print(someunicode)

into

print(someunicode.encode('foobar'))
Alex Martelli
I'm on Windows 7, using iPython running through the standard Windows console. How do I find out which encodings are supported?
Rhubarb
@Rhubarb, CP 1252 is probably what's supported (I'm not familiar with Windows 7, and you should open a specific question for that, but older versions of Windows invariably had CP 1252 support -- and CP 1252 is a charmap encoding, so that's a hint in this direction).
Alex Martelli
Don't try to input/output non-ASCII characters from/to console under IPython. It's just buggy. It's hard enough trying to get Unicode console IO to behave at the best of times; don't add the extra confusion of grappling with lower-level bugs. see eg. https://bugs.launchpad.net/ipython/+bug/339642
bobince
@bobince, sure, but once you call `.encode` you're putting out a stream of bytes, not unicode _qua_ unicode -- that works, if your encoding matches your console _and_ the unicode codepoints you're emitting (you might want an 'ignore' as the 2nd arg to `encode` to cover for the latter issue, perhaps).
Alex Martelli
@bobince, thanks. That seems to work.
Rhubarb