I have a simple Google App Engine app, that I wrote using ordinary strings. I realize I want to make it handle unicode. Are there any gotchas with this? I'm thinking of all the strings that I currently already have in the live database. (From real users who I don't want to upset.)
The datastore internally keeps all strings in unicode.
Alexander Kojevnikov said: "The datastore internally keeps all strings in unicode."
In other words, your application is already using unicode everywhere. Thank the google folks for a sensible API. No further work required.
When storing to a db.TextProperty() you need to use db.Text() like:
instance.xml = db.Text(xml_string, encoding="utf_8")
And specify the correct encoding if the string doesn't have a BOM on it. Like if you get unexpected unicode data from an XML stream.
This happened to me when using Amazon.com's product API.
Also Google's urlfetch had unicode problems dealing with that stream. So I ended up running minidom's parse() function instead of parseString() on the urllib.urlopen()'s return which acts like a stream like so to fix the problem:
response = urllib.urlopen(url)
xml = minidom.parse(response)