views:

167

answers:

2

What is a correct way to pickle an object from a class with slots, when this object references itself through one of its attributes? Here is a simple example, with my current implementation, which I'm not sure is 100 % correct:

import weakref
import pickle

class my_class(object):

    __slots__ = ('an_int', 'ref_to_self', '__weakref__')

    def __init__(self):
        self.an_int = 42
        self.ref_to_self = weakref.WeakKeyDictionary({self: 1})

    # How to best write __getstate__ and __setstate__?
    def __getstate__(self):

        obj_slot_values = dict((k, getattr(self, k)) for k in self.__slots__)
        # Conversion to a usual dictionary:
        obj_slot_values['ref_to_self'] = dict(obj_slot_values['ref_to_self'])
        # Unpicklable weakref object:
        del obj_slot_values['__weakref__']
        return obj_slot_values

    def __setstate__(self, data_dict):
        # print data_dict
        for (name, value) in data_dict.iteritems():
            setattr(self, name, value)
        # Conversion of the dict back to a WeakKeyDictionary:
        self.ref_to_self = weakref.WeakKeyDictionary(
            self.ref_to_self.iteritems())

This can be tested for instance with:

def test_pickling(obj):
    "Pickles obj and unpickles it.  Returns the unpickled object"

    obj_pickled = pickle.dumps(obj)
    obj_unpickled = pickle.loads(obj_pickled)

    # Self-references should be kept:
    print "OK?", obj_unpickled == obj_unpickled.ref_to_self.keys()[0]
    print "OK?", isinstance(obj_unpickled.ref_to_self,
                            weakref.WeakKeyDictionary)

    return obj_unpickled

if __name__ == '__main__':
    obj = my_class()
    obj_unpickled = test_pickling(obj)
    obj_unpickled2 = test_pickling(obj_unpickled)

Is this a correct/robust implementation? how should __getstate__ and __setstate__ be written if my_class inherited from a class with __slots__? is there a memory leak inside __setstate__ because of the "circular" dict?

There is a remark in PEP 307 that makes me wonder whether pickling my_class objects is at all possible in a robust way:

The __getstate__ method should return a picklable value representing the object's state without referencing the object itself.

Does this clash with the fact that a reference to the object itself is pickled?

That's a lot of questions: any remark, comment, or advice would be much appreciated!

A: 

Obviously, you can't unpickle an object that refers to itself, because you need to restore that reference before the object is restored. The traditional way to handle this is by symbolic forward references.

I suppose that:

  • You define some magic SELF_REFERENCE object (e.g. a string), the value you can't mistake for anything meaningful. Its value will be known on both sides.
  • Before pickling you check suspicious fields for self-references, every such reference is replaced by the value of that SELF_REFERENCE dummy.
  • After un-pickling, you again check the suspicious fields and replace values equal to SELF_REFERENCE to references to the newly unpickled object.
9000
Actually, the pickle module is smart enough to pickle and unpickle objects that refer to themselves. Python is great. :)
EOL
+1  A: 

It looks like what the original post suggests works well enough.

As for what PEP 307 reads:

The __getstate__ method should return a picklable value representing the object's state without referencing the object itself.

I understand that it only means that the _getstate__ method simply must return a representation that does not point to the (unpickleable) original object. Thus, returning an object that references itself is fine, as long as no reference to the original (unpickleable) object is made.

EOL