tags:

views:

80

answers:

4

I want to make (and decode) a single string composed of several python pickles.

Is there a character or sequence that is safe to use as a separator in this string?

I should be able to make the string like so:

s = pickle.dumps(o1) + PICKLE_SEPARATOR + pickle.dumps(o2) + PICKLE_SEPARATOR + pickle.dumps(o3) ...

I should be able to take this string and reconstruct the objects like so:

[pickle.loads(s) for s in input.split(PICKLE_SEPARATOR)]

What should PICKLE_SEPARATOR be?


For the curious, I want to send pickled objects to redis using APPEND. (though perhaps I'll just use RPUSH)

A: 

One solution would be to prepend your string of pickles with data about how many characters each constituent element contains.

Paul McMillan
+3  A: 

I don't use Python much, but is there a reason you couldn't just pickle an array instead? So pickling becomes

s = pickle.dumps([o1,o2,o3])

and reconstruction becomes

objs = pickle.loads(s)

Edit 1: Also, according to this answer, pickled output is self-terminating; thus, you could pickle with

s = ''.join(map(pickle.dumps,[o1,o2,o3]))

and restore with

import StringIO
sio = StringIO.StringIO(s)
objs = []
try:
    while True: objs.append(pickle.load(sio))
catch EOFError:
    pass

I'm not sure there's a benefit to this, though. (Though I didn't see one, there may well be a better way than that nasty loop/exception combo; like I said, I don't use Python much.)

Antal S-Z
+1  A: 

EDIT: First consider gnibbler's answer, which is obviously much simpler. The only reason to prefer the one below is if you want to be able split a sequence of pickles without parsing them.

A fairly safe bet is to use a brand new UUID that you never reuse anywhere else. Evaluate uuid.uuid4().bytes once and store the result in your code as the separator. E.g.:

>>> import uuid
>>> uuid.uuid4().bytes
'\xae\x9fW\xff\x19cG\x0c\xb1\xe1\x1aV%P\xb7\xa8'

Then copy-paste the resulting string literal into your code as the separator (or even just use the one above, if you want). It is pretty much guaranteed that the same sequence will never occur in anything you ever want to store.

Marcelo Cantos
+7  A: 

It's fine to just catenate the pickles together, Python knows where each one ends

>>> import cStringIO as stringio
>>> import cPickle as pickle
>>> o1 = {}
>>> o2 = []
>>> o3 = ()
>>> p = pickle.dumps(o1)+pickle.dumps(o2)+pickle.dumps(o3)
>>> s = stringio.StringIO(p)
>>> pickle.load(s)
{}
>>> pickle.load(s)
[]
>>> pickle.load(s)
()
gnibbler
One potential gotcha: this doesn't work for strings, only file-like objects: try `pickle.load(s)` three times, only the `dict` is returned.
Tim McNamara