It's due to an imperfection in the pseudofile object implemented by the zipfile
module (for the .open
method of the ZipFile
class introduced in Python 2.6). Consider:
>>> f = zf.open('data.pkl')
>>> f.read(1)
'('
>>> f.readline()
'dp1\n'
>>> f.read(1)
''
>>>
the sequence of .read(1)
-- .readline()
is what .loads
internally does (on a protocol-0 pickle, the default in Python 2, which is what you're using here). Unfortunately zipfile
's imperfection means this particular sequence doesn't work, producing a spurious "end of file" (.read returning an empty string) right after the first read/readline pair.
Not sure offhand if this bug in Python's standard library is fixed in Python 2.7 -- I'm going to check.
Edit: just checked -- the bug is fixed in Python 2.7 rc1 (the release candidate that's currently the latest 2.7 version). I don't yet know whether it's fixed in the latest bug-fix release of 2.6 as well.
Edit again: the bug is still there in Python 2.6.5, the latest bug-fix release of Python 2.6 -- so if you can't upgrade to 2.7 and need better-behaving pseudofile objects from ZipFile.open
, a backport of the 2.7 fix seems the only viable solution.
Note that it's not certain you do need better-behaving pseudofile objects; if you control the dump calls and can use the latest-and-greatest protocol, everything will be fine:
>>> zf = zipfile.ZipFile('zipped_pickle.zip', 'w', zipfile.ZIP_DEFLATED)
>>> zf.writestr('data.pkl', cPickle.dumps(some_data, -1))
>>> sd2 = cPickle.load(zf.open('data.pkl'))
>>>
it's only old crufty backwards-compatible "protocol 0" (the default) that requires proper pseudofile object behavior when mixing read and readline calls in the load
(protocol 0 is also slower, and results in larger pickles, so it's definitely not recommended unless backwards compatibility with old Python versions, or the ascii-only nature of the pickles that 0 produces, are mandatory constraints in your application).