ansaurus

Question

Answer 1

+2 A:

Depending on how you have encoded those "suit symbols" into a byte string, you'll need to make the unicode string back for it by mentioning the appropriate codec (for example, thebytestr.decode('latin-1') if latin-1 is how you encoded it!), before making the utf-8 encoding of that unicode string. Just unicode(something) uses the default encoding, which is ASCII and therefore totally ignorant of any "suit symbols"!-)

As I said back then (3 months ago), I'd go for implementing __unicode__ instead of __str__, but that's just a minor issue of simplicity. The core point is, rather: if your byte string includes anything outside of the limited ASCII encoding, you must know what encoding your byte string uses, and decode it back into Unicode by explicitly using that codec!

Alex Martelli 2009-11-23 03:19:07

Answer 2

+3 A:

Where does that UnicodeEncodeError occur exactly? I can think about two possible issues here:

The UnicodeEncodeError occurs in you __unicode__ method.
Your __unicode__ method returns a byte string instead of a unicode object and that byte string contains non-ASCII characters.

Do you have a __unicode__ method in your class?

I tried this on the Python console according to the actual data from your comment:

>>> u'\u2660'.encode('utf-8')
'\xe2\x99\xa0'
>>> print '\xe2\x99\xa0'
♠

It seems to work. Could you please try to print the same on your console? Maybe your console encoding is the problem.

fviktor 2009-11-23 03:22:31

Yes, it does, which simply return the value u'\u2660'.

sharvey 2009-11-24 01:47:21

Thanks for quoting some actual data here, it helps a lot. I've just changed my solution according to the actual data from your message. See above.

fviktor 2009-11-24 11:33:43

Thanks for the follow-up. It does work as expected in the console. However, when I simply do "return u'\u2660'.encode('utf-8')" from the __str__ function in my class, I still get the error.

sharvey 2009-11-26 01:37:17

The code you provided works, but only when I run it with python 2.6, not the one that textmate calls by default (2.5.1). Thanks for the help; I'll remember to post the python version next time.

sharvey 2009-11-26 02:45:32

ansaurus

tags:

views:

answers:

Python unicode character in str

related questions

ansaurus

tags:

views:

answers:

Python unicode character in __str__

related questions

Python unicode character in str