views:

419

answers:

3

I'm trying to write a class for a read-only object which will not be really copied with the copy module, and when it will be pickled to be transferred between processes each process will maintain no more than one copy of it, no matter how many times it will be passed around as a "new" object. Is there already something like that?

+1  A: 

I don't know of any such functionality already implemented. The interesting problem is as follows, and needs precise specs as to what's to happen in this case...:

  • process A makes the obj and sends it to B which unpickles it, so far so good
  • A makes change X to the obj, meanwhile B makes change Y to ITS copy of the obj
  • now either process sends its obj to the other, which unpickles it: what changes to the object need to be visible at this time in each process? does it matter whether A's sending to B or vice versa, i.e. does A "own" the object? or what?

If you don't care, say because only A OWNS the obj -- only A is ever allowed to make changes and send the obj to others, others can't and won't change -- then the problems boil down to identifying obj uniquely -- a GUID will do. The class can maintain a class attribute dict mapping GUIDs to existing instances (probably as a weak-value dict to avoid keeping instances needlessly alive, but that's a side issue) and ensure the existing instance is returned when appropriate.

But if changes need to be synchronized to any finer granularity, then suddenly it's a REALLY difficult problem of distributed computing and the specs of what happens in what cases really need to be nailed down with the utmost care (and more paranoia than is present in most of us -- distributed programming is VERY tricky unless a few simple and provably correct patterns and idioms are followed fanatically!-).

If you can nail down the specs for us, I can offer a sketch of how I would go about trying to meet them. But I won't presume to guess the specs on your behalf;-).

Edit: the OP has clarified, and it seems all he needs is a better understanding of how to control __new__. That's easy: see __getnewargs__ -- you'll need a new-style class and pickling with protocol 2 or better (but those are advisable anyway for other reasons!-), then __getnewargs__ in an existing object can simply return the object's GUID (which __new__ must receive as an optional parameter). So __new__ can check if the GUID is present in the class's memo [[weakvalue;-)]]dict (and if so return the corresponding object value) -- if not (or if the GUID is not passed, implying it's not an unpickling, so a fresh GUID must be generated), then make a truly-new object (setting its GUID;-) and also record it in the class-level memo.

BTW, to make GUIDs, consider using the uuid module in the standard library.

Alex Martelli
Apologies, @Alex Martelli, I should have mentioned that the object is read-only.
cool-RR
@cool-RR, then the approach I mention ("A OWNS") should work (A has no reason to ever send it more than once if it's read-only, so you could dispense with every part of my suggestion, but maybe what **you** mean by "read-only" is incredibly peculiar and DOES include **changes**... which would be totally contradictory to "read-only" in any SENSIBLE interpretation I can think of, but...!-).So what if anything do you find missing in my GUID-based suggestion, oh @cool-rr?
Alex Martelli
@Alex Martelli: Yes, that's pretty much the approach I had in mind. I'm trying to implement it, and it seems that the `__new__` method is where the action's at, but I'm having some trouble because documentation is scarce and I don't understand how the `__new__` method knows whether it's unpickling time or normal creation time.
cool-RR
@cool-RR, see my edits for how to control `__new__` and how to make UUIDs, including pointers to the documentation.
Alex Martelli
@Alex Martelli: I intend to let users subclass this class. Will I be limiting them not to use some keyword in the constructors of their objects?
cool-RR
@cool-RR, you can use attribute/method identifiers starting with double underscores (the Python compiler mangles them by inserting the class name) to avoid most possibilities of accidental clashes with subclasses -- for this problem just like for any other subclassing.
Alex Martelli
A: 

you could use simply a dictionnary with the key and the values the same in the receiver. And to avoid a memory leak use a WeakKeyDictionary.

Xavier Combelle
You'd have to explain more thoroughly.
cool-RR
A: 

I made an attempt to implement this. @Alex Martelli and anyone else, please give me comments/improvements. I think this will eventually end up on GitHub.

"""
todo: need to lock library to avoid thread trouble?

todo: need to raise an exception if we're getting pickled with
an old protocol?

todo: make it polite to other classes that use __new__. Therefore, should
probably work not only when there is only one item in the *args passed to new.

"""

import uuid
import weakref

library = weakref.WeakValueDictionary()

class UuidToken(object):
    def __init__(self, uuid):
        self.uuid = uuid


class PersistentReadOnlyObject(object):
    def __new__(cls, *args, **kwargs):
        if len(args)==1 and len(kwargs)==0 and isinstance(args[0], UuidToken):
            received_uuid = args[0].uuid
        else:
            received_uuid = None

        if received_uuid:
            # This section is for when we are called at unpickling time
            thing = library.pop(received_uuid, None)
            if thing:
                thing._PersistentReadOnlyObject__skip_setstate = True
                return thing
            else: # This object does not exist in our library yet; Let's add it
                new_args = args[1:]
                thing = super(PersistentReadOnlyObject, cls).__new__(cls,
                                                                     *new_args,
                                                                     **kwargs)
                thing._PersistentReadOnlyObject__uuid = received_uuid
                library[received_uuid] = thing
                return thing

        else:
            # This section is for when we are called at normal creation time
            thing = super(PersistentReadOnlyObject, cls).__new__(cls, *args,
                                                                 **kwargs)
            new_uuid = uuid.uuid4()
            thing._PersistentReadOnlyObject__uuid = new_uuid
            library[new_uuid] = thing
            return thing

    def __getstate__(self):
        my_dict = dict(self.__dict__)
        del my_dict["_PersistentReadOnlyObject__uuid"]
        return my_dict

    def __getnewargs__(self):
        return (UuidToken(self._PersistentReadOnlyObject__uuid),)

    def __setstate__(self, state):
        if self.__dict__.pop("_PersistentReadOnlyObject__skip_setstate", None):
            return
        else:
            self.__dict__.update(state)

    def __deepcopy__(self, memo):
        return self

    def __copy__(self):
        return self

# --------------------------------------------------------------
"""
From here on it's just testing stuff; will be moved to another file.
"""


def play_around(queue, thing):
    import copy
    queue.put((thing, copy.deepcopy(thing),))

class Booboo(PersistentReadOnlyObject):
    def __init__(self):
        self.number = random.random()

if __name__ == "__main__":

    import multiprocessing
    import random
    import copy

    def same(a, b):
        return (a is b) and (a == b) and (id(a) == id(b)) and \
               (a.number == b.number)

    a = Booboo()
    b = copy.copy(a)
    c = copy.deepcopy(a)
    assert same(a, b) and same(b, c)

    my_queue = multiprocessing.Queue()
    process = multiprocessing.Process(target = play_around,
                                      args=(my_queue, a,))
    process.start()
    process.join()
    things = my_queue.get()
    for thing in things:
        assert same(thing, a) and same(thing, b) and same(thing, c)
    print("all cool!")
cool-RR