views:

195

answers:

2

Does Pickle always produce the same output for a certain input value? I suppose there could be a gotcha when pickling dictionaries that have the same contents but different insert/delete histories. My goal is to create a "signature" of function arguments, using Pickle and SHA1, for a memoize implementation.

A: 

What do you mean by same output ? You should normally always get the same output for a roundtrip (pickling -> unpickling), but I don't think the serialized format itself is guaranteed to be the same in every condition. Certainly, it may change between platforms and all that.

Within one run of your program, using pickling for memoization should be fine - I have used this scheme several times without trouble, but that was for quite simple problems. One problem is that this does not cover every useful case (function come to mind: you cannot pickle them, so if your function takes a callable argument, that won't work).

David Cournapeau
+5  A: 

I suppose there could be a gotcha when pickling dictionaries that have the same contents but different insert/delete histories.

Right:

>>> pickle.dumps({1: 0, 9: 0}) == pickle.dumps({9: 0, 1: 0})
False

See also: pickle.dumps not suitable for hashing

My goal is to create a "signature" of function arguments, using Pickle and SHA1, for a memoize implementation.

There's a number of fundamental problems with this. It's impossible to come up with an object-to-string transformation that maps equality correctly—think of the problem of object identity:

>>> a = object()
>>> b = object()
>>> a == b
False
>>> pickle.dumps(b) == pickle.dumps(a)
True

Depending on your exact requirements, you may be able to transform object hierarchies into ones that you could then hash:

def hashablize(obj):
    """Convert a container hierarchy into one that can be hashed.

    Don't use this with recursive structures!
    Also, this won't be useful if you pass dictionaries with
    keys that don't have a total order.
    Actually, maybe you're best off not using this function at all."""
    try:
        hash(obj)
    except TypeError:
        if isinstance(obj, dict):
            return tuple((k, hashablize(v)) for (k, v) in sorted(obj.iteritems()))
        elif hasattr(obj, '__iter__'):
            return tuple(hashablize(o) for o in obj)
        else:
            raise TypeError("Can't hashablize object of type %r" % type(obj))
    else:
        return obj
Miles