tags:

views:

86

answers:

2

I know this isn't exactly how the pickle module was intended to be used, but I would have thought this would work. I'm using Python 3.1.2

Here's the background code:

import pickle

FILEPATH='/tmp/tempfile'

class HistoryFile():
    """
    Persistent store of a history file  
    Each line should be a separate Python object
    Usually, pickle is used to make a file for each object,
        but here, I'm trying to use the append mode of writing a file to store a sequence
    """

    def validate(self, obj):
        """
        Returns whether or not obj is the right Pythonic object
        """
        return True

    def add(self, obj):
        if self.validate(obj):
            with open(FILEPATH, mode='ba') as f:    # appending, not writing
                f.write(pickle.dumps(obj))
        else:
            raise "Did not validate"

    def unpack(self):
        """
        Go through each line in the file and put each python object
        into a list, which is returned
        """
        lst = []
        with open(FILEPATH, mode='br') as f:
            # problem must be here, does it not step through the file?
            for l in f:
                lst.append(pickle.loads(l))
        return lst

Now, when I run it, it only prints out the first object that is passed to the class.

if __name__ == '__main__':

    L = HistoryFile()
    L.add('a')
    L.add('dfsdfs')
    L.add(['dfdkfjdf', 'errree', 'cvcvcxvx'])

    print(L.unpack())       # only prints the first item, 'a'!

Is this because it's seeing an early EOF? Maybe appending is intended only for ascii? (in which case, why is it letting me do mode='ba'?) Is there a much simpler duh way to do this?

+2  A: 

Why would you think appending binary pickles would produce a single pickle?! Pickling lets you put (and get back) several items one after the other, so obviously it must be a "self-terminating" serialization format. Forget lines and just get them back! For example:

>>> import pickle
>>> import cStringIO
>>> s = cStringIO.StringIO()
>>> pickle.dump(23, s)
>>> pickle.dump(45, s)
>>> s.seek(0)
>>> pickle.load(s)
23
>>> pickle.load(s)
45
>>> pickle.load(s)
Traceback (most recent call last):
   ...
EOFError
>>> 

just catch the EOFError to tell you when you're done unpickling.

Alex Martelli
Yes but class has to open the file for write permission, which erases the file. I want to keep it. This is why I thought of appending the file. So is standard practice to read in the contents before opening for writing?
Adam Morris
@Adam, just open with `'r+'` (or better 'r+b' so you can use a protocol of `pickle.HIGHEST_PROTOCOL` for your picking!), cfr http://docs.python.org/library/functions.html?highlight=open#open and http://docs.python.org/library/pickle.html?highlight=pickle#pickle.HIGHEST_PROTOCOL .
Alex Martelli
@Alex got it; thanks. I hadn't paid close enough attention to the + in the open routine. Really quite simple. Still not sure what use the HIGHEST_PROTOCOL is though, in python3 they recommend protocol 3 which is the default anyway. ...
Adam Morris
@Adam, `HIGHEST_PROTOCOL`, aka `-1`, means "the best you can do" and is what you should always use unless you need your pickles to be loadable by older versions of Python. In Python 2.*, the default is 0 (the ascii protocol), and so it must stay for backwards compatibility -- as it must stay 3 in Python 3.* forevermore, even if somebody tomorrow invents a new format that takes half the time and space (unlikely, I know;-). So, always use -1 if you don't care about making pickles that are readable by older Python versions!-)
Alex Martelli
Just tried it out, mode='r+b' didn't seem to work. Looking over the docs, understanding better now, I tried 'a+b' and it worked just as expected. I know pickle wasn't conceived this way, but it turns out you can write to the file a sequence of pickled items!
Adam Morris
@Adam, `'r+b'` gives you more control but by the same token requires explicit seeking -- glad to hear that `'a+b'` serves you better. And: pickle _was_ conceived exactly to dump and then restore N items one after the other, that's why each item is "self-terminating", as I explained in my answer.
Alex Martelli
Got it! Thanks!
Adam Morris
A: 

The answer is that it DOES work, but without the '+' in mode the newlines automatically added by the append feature of open mixes up the binary with the string data (a definite no-no). Change this line:

with open(FILEPATH, mode='ab') as f:    # appending, not writing
    f.write(pickle.dumps(obj))

to

with open(FILEPATH, mode='a+b') as f:    # appending, not writing
    pickle.dump(obj, f)

Alex also points out that for more flexibility use mode='r+b', but this requires the appropriate seeking. Since I wanted to make a history file that behaved like a first-in, last-out sort of sequence of pythonic objects, it actually made sense for me to try appending objects in a file. I just wasn't doing it correctly :)

There is no need to step through the file because (duh!) it is serialized. So replace:

for l in f:
    lst.append(pickle.loads(l))

with

while 1:
    try:
        lst.append(pickle.load(f))
    except IOError:
        break
Adam Morris