ansaurus

Question

Answer 1

A:

There are two different things that Python treats as strings - 'raw' strings and 'unicode' strings. Only the latter actually represent text. If you have a raw string, and you want to treat it as text, you first need to convert it to a unicode string. To do this, you need to know the encoding for the string - they way unicode codepoints are represented as bytes in the raw string - and call .decode(encoding) on the raw string.

When you call str() on a unicode string, the opposite transformation takes place - Python encodes the unicode string as bytes. If you don't specify a character set, it defaults to ascii, which is only capable of representing the first 128 codepoints.

Instead, you should do one of two things:

Represent 'imageAltTags' as a list of unicode strings, and thus dump the str() call - this is probably the best approach
Instead of str(x), call x.encode(encoding). The encoding to use will depend on what you're doing, but the most likely choice is utf-8 - eg, x.encode('utf-8').

Nick Johnson 2010-07-01 09:01:46

this is very common issue that Python 2 users run into every day. it happens so much that i ended up blogging about it... http://wesc.livejournal.com/1743.html

wescpy 2010-07-03 08:44:20

I dumped str() and now things work fine. I am dealing with everything as a unicode string. Thanks!

demos 2010-07-03 14:48:11

ansaurus

tags:

views:

answers:

UnicodeEncodeError Google App Engine

related questions