views:

712

answers:

3

I have a simple Google App Engine app, that I wrote using ordinary strings. I realize I want to make it handle unicode. Are there any gotchas with this? I'm thinking of all the strings that I currently already have in the live database. (From real users who I don't want to upset.)

+1  A: 

The datastore internally keeps all strings in unicode.

Alexander Kojevnikov
+2  A: 

Alexander Kojevnikov said: "The datastore internally keeps all strings in unicode."

In other words, your application is already using unicode everywhere. Thank the google folks for a sensible API. No further work required.

ddaa
Thanks. I'll check this out. I seemed to see a problem entering accented characters but that may just be in my code.
interstar
+1  A: 

When storing to a db.TextProperty() you need to use db.Text() like:

instance.xml = db.Text(xml_string, encoding="utf_8")

And specify the correct encoding if the string doesn't have a BOM on it. Like if you get unexpected unicode data from an XML stream.

This happened to me when using Amazon.com's product API.

Also Google's urlfetch had unicode problems dealing with that stream. So I ended up running minidom's parse() function instead of parseString() on the urllib.urlopen()'s return which acts like a stream like so to fix the problem:

response = urllib.urlopen(url)
xml = minidom.parse(response)
Robert