Or we can store the dict string
directly to DB without serializing.
There is no such thing as "the dict string". There are many ways to serialize a dict into a string; you may be thinking of repr
, possibly as eval
as the way to get the dict back (you mention exec
, but that's simply absurd: what statement would you execute...?! I think you probably mean eval
). They're different serialization methods with their tradeoffs, and in many cases the tradeoffs tend to favor pickling (cPickle
, for speed, with protocol -1
meaning "the best you can do", usually).
Performance is surely an issue, e.g., in terms of size of what you're storing...:
$ python -c 'import cPickle; d=dict.fromkeys(range(99), "banana"); print len(repr(d))'
1376
$ python -c 'import cPickle; d=dict.fromkeys(range(99), "banana"); print len(cPickle.dumps(d,-1))'
412
...why would you want to store 1.4 KB rather than 0.4 KB each time you serialize a dict like this one...?-)
Edit: since some suggest Json, it's worth pointing out that json takes 1574 bytes here -- even bulkier than bulky repr!
As for speed...
$ python -mtimeit -s'import cPickle; d=dict.fromkeys(range(99), "chocolate")' 'eval(repr(d))'
1000 loops, best of 3: 706 usec per loop
$ python -mtimeit -s'import cPickle; d=dict.fromkeys(range(99), "chocolate")' 'cPickle.loads(cPickle.dumps(d, -1))'
10000 loops, best of 3: 70.2 usec per loop
...why take 10 times longer? What's the upside that would justify paying such a hefty price?
Edit: json takes 2.7 milliseconds -- almost forty times slower than cPickle.
Then there's generality -- not every serializable object can properly round-trip with repr and eval, while pickling is much more general. E.g.:
$ python -c'def f(): pass
d={23:f}
print d == eval(repr(d))'
Traceback (most recent call last):
File "<string>", line 3, in <module>
File "<string>", line 1
{23: <function f at 0x241970>}
^
SyntaxError: invalid syntax
vs
$ python -c'import cPickle
def f(): pass
d={"x":f}
print d == cPickle.loads(cPickle.dumps(d, -1))'
True
Edit: json is even less general than repr in terms of round-trips.
So, comparing the two serialization approaches (pickling vs repr/eval), we see: pickling is way more general, it can be e.g. 10 times faster, and take up e.g. 3 times less space in your database.
What compensating advantages do you envisage for repr/eval...?
BTW, I see some answers mention security, but that's not a real point: pickling is insecure too (the security issue with eval`ing untrusted strings may be more obvious, but unpickling an untrusted string is also insecure, though in subtler and darker ways).
Edit: json is more secure. Whether that's worth the huge cost in size, speed and generality, is a tradeoff worth pondering. In most cases it won't be.