tags:

views:

324

answers:

2

Did defaultdict's become not marshal'able as of Python 2.6? The following works under 2.5, fails under 2.6 with "ValueError: unmarshallable object" on OS X 1.5.6, python-2.6.1-macosx2008-12-06.dmg from python.org:

from collections import defaultdict
import marshal
dd = defaultdict(list)
marshal.dump(dd, file('/tmp/junk.bin','wb') )
+9  A: 

Marshal was deliberately changed to not support subclasses of built-in types. Marshal was never supposed to handle defaultdicts, but happened to since they are a subclass of dict. Marshal is not a general "persistence" module; only None, integers, long integers, floating point numbers, strings, Unicode objects, tuples, lists, sets, dictionaries, and code objects are supported.

Python 2.5:

>>> marshal.dumps(defaultdict(list))
'{0'
>>> marshal.dumps(dict())
'{0'

If for some reason you really want to marshal a defaultdict you can convert it to a dict first, but odds are you should be using a different serialization mechanism, like pickling.

Miles
Thanks Miles. Problem is there is a very significant performance difference between pickling and marshaling - at the data size I'm working with it amounts to a few hours for each run.I guess I'll stick with 2.5 or convert to a dict before marshaling.
Parand
Are you using cPickle, with HIGHEST_PROTOCOL?
Miles
+2  A: 

wrt performance issues.. encoding a list of ~600000 dicts, each with 4 key/values, one of the values has a list (around 1-3 length) of 2 key/val dicts:

In [27]: timeit(cjson.encode, data)
4.93589496613

In [28]: timeit(cPickle.dumps, data, -1)
141.412974119

In [30]: timeit(marshal.dumps, data, marshal.version)
1.13546991348
dsvensson
gc.disable(); timeit(cPickle.dumps, ...); gc.enable() gets the time down to around 14 seconds, which might be a good enough improvement
Henrik Gustafsson