views:

76

answers:

1

I have a db.StringProperty() mRegion that is set to some Korean text. I see in my Dashboard that the value is visibly in Korean like this:

한국 : 충청남도

However, when I take this field and add it into a string list property (db.StringListProperty()) I end up with something like this:

\ud55c\uad6d : \ucda9\uccad\ub0a8\ub3c4

I am having issues displaying this text on my client when I have this string list property value output to the client, so it makes me wonder if something is wrong on the server end when the value is stored (as I would expect it to be readable Korean like the StringProperty).

Does anyone know where I might be going wrong with this or if this second display is simply normal in string list objects and the problem is likely on my client end?

Thanks.

Update with more detail of the issues: My client is an iphone app. Basically, I use the iPhone to get the user's gps location info using the reverse geocoder api. I send this to app engine and save it. This part appears to be working because for Korea, I see the Korean characters. The region name is obtained, in summary, like this:

region = self.request.get('region')
entry.init(region)
...
self.mRegion = region

pretty straightforward (and it works).

Where it breaks down is when I retrieve that data and send it back to the client. To summarize:

query = db.GqlQuery("SELECT * FROM RegionData WHERE mLatitudeCenter >= :1 and mLatitudeCenter <= :2", latmin, latmax)
for entry in query:
        output += entry.mRegion + ','
self.response.out.write(output)

When I take this and put it on a UILabel in the client, it's garbled. Also, when I take the garbled value in the client and send it back to the server to look up a region, it fails, so that suggests to me that instead of sending the Korean text maybe it's transmitting the repr() characters or something. If, as you say, it's just a matter of presentation and not the inherent data itself, then perhaps it's something to do with the system font I'm using to try to display this data? I had thought that somewhere I was missing the right call to encode() or decode(), but not sure.

+1  A: 

It's quite possible that the admin interface displays the two differently, yes. In the latter case it's clearly doing a repr(s), while in the former it's just printing the string.

The admin interface's interface doesn't affect how your code works, though - both Strings and StringLists are stored the same way in the datastore, and will come back as Unicode strings for you to deal with as you wish.

I highly recommend reading this Joel on Software post about unicode. In short, you're dealing with two kinds of things: Binary data, and unicode characters. To confuse you, Python exposes these both as strings - 'unicode strings' and 'raw strings', respectively, but you should only treat the former as actual strings.

The datastore, with its StringListProperty and StringProperty, stores and returns Unicode strings. Your framework should also be giving you Unicode strings, and accepting Unicode strings back, but some poorly designed frameworks don't.

What you need to do is check that you are using Unicode strings everywhere you deal with text, that you explicitly call .encode() to convert a Unicode string to a raw string, and .decode() to convert a raw string to a unicode string, and that the character encoding on the returned response is set correctly, and you're encoding your strings using the same encoding. How you do that will depend on your framework.

Once you've done that, if you still have trouble, I would suggest writing some simple unit tests - storing data to the datastore and retrieving it and manipulating it, then checking it equals what you expect - to pin down where the issue is.

Nick Johnson
Thanks, I added more details.
Joey
Updated my response.
Nick Johnson
Thanks for the clarification and that great link. Your second paragraph does address the original question of this post so that's good enough for me to consider it answered. I did find the cause of my problem in the end and it turned out to be my client was improperly encoding the server response in ASCII versus UTF-8. It appeared that despite how it was visible in the admin interface, the strings were indeed the same in the datastore and still correct.
Joey