I'm using Python (Python 2.5.2 on Ubuntu 8.10) to parse JSON from (ASCII encoded) text files. When loading these files with json
(simplejson
), all my string values are cast to Unicode objects instead of string objects.
The problem is, I have to use the data with some libraries that only accept string objects.
Is it possible to get string objects instead unicode ones from simplejson
?
Any hints on how I can achieve this automatically?
Edit: I can't change the libraries nor update them. One - the csv
module - is even in the Python standard library (the documentation says it will support Unicode in the future). I could write wrappers of course, but maybe there is a more convenient way?
The actual data I parse from the JSON files is rather nested and complex, so it would be a pain to look for every Unicode object therein and cast it manually...
Here's a small example:
>>> import simplejson as json
>>> l = ['a', 'b']
>>> l
['a', 'b']
>>> js = json.dumps(l)
>>> js
'["a", "b"]'
>>> nl = json.loads(js)
>>> nl
[u'a', u'b']
Update: I completely agree with Jarret Hardie and nosklo: Since the JSON specs specifically state strings as Unicode simplejson
should return Unicode objects.
But while searching the net, I came across some post, where people complain about simplejson
actually returning string objects... I couldn't reproduce this behavior but it seems it is possible. Any hints?
Workaround
Right now I use PyYAML to parse the files, it gives me string objects.
Since JSON is a subset of YAML it works nicely.