views:

88

answers:

2

I want to create a polymorphic structure that can be created on the fly with minimum typing effort and be very readable. For example:

a.b = 1
a.c.d = 2
a.c.e = 3
a.f.g.a.b.c.d = cucu
a.aaa = bau

I do not want to create an intermediate container such as:

a.c = subobject()
a.c.d = 2
a.c.e = 3

My question is similar to this one:

http://stackoverflow.com/questions/635483/what-is-the-best-way-to-implement-nested-dictionaries-in-python

But I am not happy with the solution there because I think there is a bug:
Items will be created even when you don't want: suppose you want to compare 2 polymorphic structures: it will create in the 2nd structure any attribute that exists in the first and is just checked in the other. e.g:

a = {1:2, 3: 4}
b = {5:6}

# now compare them:

if b[1] == a[1]
    # whoops, we just created b[1] = {} !

I also want to get the simplest possible notation

a.b.c.d = 1
    # neat
a[b][c][d] = 1
    # yuck

I did try to derive from the object class... but I couldn't avoid to leave the same bug as above where attributes were born just by trying to read them: a simple dir() would try to create attributes like "methods"... like in this example, which is obviously broken:

class KeyList(object):
    def __setattr__(self, name, value):
        print "__setattr__ Name:", name, "value:", value
        object.__setattr__(self, name, value)
    def __getattribute__(self, name):
        print "__getattribute__ called for:", name
        return object.__getattribute__(self, name)
    def __getattr__(self, name):
        print "__getattr__ Name:", name
        try:
            ret = object.__getattribute__(self, name)
        except AttributeError:
            print "__getattr__ not found, creating..."
            object.__setattr__(self, name, KeyList())
            ret = object.__getattribute__(self, name)
        return ret

>>> cucu = KeyList()
>>> dir(cucu)
__getattribute__ called for: __dict__
__getattribute__ called for: __members__
__getattr__ Name: __members__
__getattr__ not found, creating...
__getattribute__ called for: __methods__
__getattr__ Name: __methods__
__getattr__ not found, creating...
__getattribute__ called for: __class__

Thanks, really!

p.s.: the best solution I found so far is:

class KeyList(dict):
    def keylset(self, path, value):
        attr = self
        path_elements = path.split('.')
        for i in path_elements[:-1]:
            try:
                attr = attr[i]
            except KeyError:
                attr[i] = KeyList()
                attr = attr[i]
        attr[path_elements[-1]] = value

# test
>>> a = KeyList()
>>> a.keylset("a.b.d.e", "ferfr")
>>> a.keylset("a.b.d", {})
>>> a
{'a': {'b': {'d': {}}}}

# shallow copy
>>> b = copy.copy(a)
>>> b
{'a': {'b': {'d': {}}}}
>>> b.keylset("a.b.d", 3)
>>> b
{'a': {'b': {'d': 3}}}
>>> a
{'a': {'b': {'d': 3}}}

# complete copy
>>> a.keylset("a.b.d", 2)
>>> a
{'a': {'b': {'d': 2}}}
>>> b
{'a': {'b': {'d': 2}}}
>>> b = copy.deepcopy(a)
>>> b.keylset("a.b.d", 4)
>>> b
{'a': {'b': {'d': 4}}}
>>> a
{'a': {'b': {'d': 2}}}
+1  A: 

I think at a minimum you need to do a check in __getattr__ that the requested attrib doesn't start and end with __. Attributes which match that description implement established Python APIs, so you shouldn't be instantiating those attributes. Even so you'll still end up implementing some API attribs, like for example next. In that case you would end up with an exception being thrown if you pass the object to some function that uses duck typing to see if it's an iterator.

It would really be better to create a "whitelist" of valid attrib names, either as a literal set, or with a simple formula: e.g. name.isalpha() and len(name) == 1 would work for the one-letter attribs you're using in the example. For a more realistic implementation you'd probably want to define a set of names appropriate to the domain your code is working in.

I guess the alternative is to make sure that you're not dynamically creating any of the various attribute names that are part of some protocol, as next is part of the iteration protocol. The methods of the ABCs in the collections module comprise a partial list, but I don't know where to find a full one.

You are also going to have to keep track of whether or not the object has created any such child nodes, so that you will know how to do comparisons against other such objects.

If you want comparisons to avoid autovivification, you will have to implement a __cmp__ method, or rich comparison methods, in the class that checks the __dict__s of the objects being compared.

I have a sneaking feeling that there's a few complications I didn't think of, which wouldn't be surprising since this is not really how Python is supposed to work. Tread carefully, and think about whether the added complexity of this approach is worth what it will get you.

intuited
thanks for advice, however you need to be able to define avery name you want and instantiate none of the referenced ones :-)
RumburaK
Yes, well I think that's the alternative to some otherwise risky business, where you're potentially "implementing" methods that might cause incorrect interpretations on the part of certain python functions. If you're just going to be using it as flat data though, and never passing it to any functions, just filtering out `__name__`s would probably be good enough.
intuited
+1  A: 

If you're looking for something that's not as dynamic as your original post, but more like your best solution so far, you might see if Ian Bicking's formencode's variabledecode would meet your needs. The package itself is intended for web forms and validation, but a few of the methods seem pretty close to what you're looking for.
If nothing else, it might serve as an example for your own implementation.

A small example:

>>> from formencode.variabledecode import variable_decode, variable_encode
>>>
>>> d={'a.b.c.d.e': 1}
>>> variable_decode(d)
{'a': {'b': {'c': {'d': {'e': 1}}}}}
>>>
>>> d['a.b.x'] = 3
>>> variable_decode(d)
{'a': {'b': {'c': {'d': {'e': 1}}, 'x': 3}}}
>>>
>>> d2 = variable_decode(d)
>>> variable_encode(d2) == d
True
ma3