views:

938

answers:

2

From this question and my own benchmarks it seems that the marshal module is about 20-30x faster than cPickle. Why is this so? What functionality does cPickle offer over marshal that justifies this? (Another way of putting it - why not always use marshal? Why do both of these modules exist?)

+7  A: 

I'm assuming you don't have access to the documentation.

Key points.

  1. Marshall is to read and write Python values in a binary format.

  2. Marshall is not a general persistence and transfer of Python objects through RPC calls, see the modules pickle and shelve.

So, it appears to me that (a) marshal is binary in nature, and (b) pickle is character in nature. Probably that's why marshal is faster.

Oh, here's a nugget: "Details of the format are undocumented on purpose". So (c) marshal is allowed to cut corners or optimize in obscure ways.

S.Lott
+7  A: 

I think, it's explained in the documentation: 13.5 marshal -- Internal Python object serialization. Notably,

Warning: The marshal module is not intended to be secure against erroneous or maliciously constructed data. Never unmarshal data received from an untrusted or unauthenticated source.

and

Warning: Some unsupported types such as subclasses of builtins will appear to marshal and unmarshal correctly, but in fact, their type will change and the additional subclass functionality and instance attributes will be lost.

atzz
WarningThe pickle module is not intended to be secure against erroneous or maliciously constructed data. Never unpickle data received from an untrusted or unauthenticated source.
Brandon Thomson