views:

52

answers:

3

Is this a bug?

>>> import json
>>> import cPickle
>>> json.dumps(cPickle.dumps(u'å'))
Traceback (most recent call last):
  File "<console>", line 1, in <module>
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/json/__init__.py", line 230, in dumps
    return _default_encoder.encode(obj)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/json/encoder.py", line 361, in encode
    return encode_basestring_ascii(o)
UnicodeDecodeError: 'utf8' codec can't decode bytes in position 1-3: invalid data
+1  A: 

The json module is expecting strings to encode text. Pickled data isn't text, it's 8-bit binary.

One simple workaround, if you really need to send pickled data over JSON, is to use base64:

j = json.dumps(base64.b64encode(cPickle.dumps(u'å')))
cPickle.loads(base64.b64decode(json.loads(j)))

Note that this is very clearly a Python bug. Protocol version 0 is explicitly documented as ASCII, yet å is sent as the non-ASCII byte \xe5 instead of encoding it as "\u00E5". This bug was reported upstream--and the ticket was closed without the bug being fixed. http://bugs.python.org/issue2980

Glenn Maynard
+1  A: 

Could be a bug in pickle. My python documentation says (for used pickle format): Protocol version 0 is the original ASCII protocol and is backwards compatible with earlier versions of Python. [...] If a protocol is not specified, protocol 0 is used.


>>> cPickle.dumps(u'å').decode('ascii')
Traceback (most recent call last):
  File "", line 1, in 
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe5 in position 1: ordinal not in range(128)

that aint no ASCII

and, don't know whether its relevant, or even a problem:

 
>>> cPickle.dumps(u'å') == pickle.dumps(u'å')
False
knitti
A: 

I'm using Python2.6 and your code runs without any error.

In [1]: import json

In [2]: import cPickle

In [3]: json.dumps(cPickle.dumps(u'å'))
Out[3]: '"V\\u00e5\\np1\\n."'

BTW, what's your system default encoding, in my case, it's

In [6]: sys.getdefaultencoding()
Out[6]: 'ascii'
Satoru.Logic
The error does happen for me in 2.6.4. What patch version? Maybe the closed-and-pretend-it's-not-a-bug bug was fixed anyway in a later 2.6 release.
Glenn Maynard
@Glenn Maynard: I'm using 2.6.5 :P
Satoru.Logic
Also happens in 2.6.6, regardless of sys.getdefaultencoding. `cPickle.dumps(u'å')` returns `'V\xe5\n.'`, not `'V\\u00e5\n.'`. I'm curious why yours is returning the latter (which is the correct output: entirely ASCII).
Glenn Maynard
Oh, it's a difference in your json.dumps behavior, not pickle, which is less mysterious; your cPickle module has the same bug as everyone else's. Note that I can't decode your data above; `json.loads()` on that string results in a Unicode string, which `cPickle.loads` throws an error on (`argument 1 must be string, not unicode`).
Glenn Maynard