I don't know of any theoretical way to decide which method is faster, and even if I did, I'm not sure I would trust it. So let's write some code and test it.
If we package our pickle/shelve managers in classes with a common interface, then it will be easy to swap them in and out of your code. So if at some future point you discover one is better than the other (or discover some even better way) all you have to do is write a class with the same interface and you'll be able to plug the new class into your code with very little modification to anything else.
test.py:
import cPickle
import shelve
import os
class PickleManager(object):
def store(self,name,value):
with open(name,'w') as f:
cPickle.dump(value,f)
def load(self,name):
with open(name,'r') as f:
return cPickle.load(f)
class ShelveManager(object):
def __enter__(self):
if os.path.exists(self.fname):
self.shelf=shelve.open(self.fname)
else:
self.shelf=shelve.open(self.fname,'n')
return self
def __exit__(self,ext_type,exc_value,traceback):
self.shelf.close()
def __init__(self,fname):
self.fname=fname
def store(self,name,value):
self.shelf[name]=value
def load(self,name):
return self.shelf[name]
def write(manager):
for i in range(100):
fname='/tmp/{i}.dat'.format(i=i)
data='The sky is so blue'*100
manager.store(fname,data)
def read(manager):
for i in range(100):
fname='/tmp/{i}.dat'.format(i=i)
manager.load(fname)
Normally, you'd use PickleManager like this:
manager=PickleManager()
manager.load(...)
manager.store(...)
while you'd use the ShelveManager like this:
with ShelveManager('/tmp/shelve.dat') as manager:
manager.load(...)
manager.store(...)
But to test performance, you could do something like this:
python -mtimeit -s'import test' 'with test.ShelveManager("/tmp/shelve.dat") as s: test.read(s)'
python -mtimeit -s'import test' 'test.read(test.PickleManager())'
python -mtimeit -s'import test' 'with test.ShelveManager("/tmp/shelve.dat") as s: test.write(s)'
python -mtimeit -s'import test' 'test.write(test.PickleManager())'
At least on my machine, the results came out like this:
read (ms) write (ms)
PickleManager 9.26 7.92
ShelveManager 5.32 30.9
So it looks like ShelveManager may be faster at reading, but PickleManager may be faster at writing.
Be sure to run these tests yourself. Timeit results can vary due to version of Python, OS, filesystem type, hardware, etc.
Also, note my write
and read
functions generate very small files. You'll want to test this on data more similar to your use case.