tags:

views:

58

answers:

3

Hi all,

I have code that statically registers (type, handler_function) pairs at module load time, resulting in a dict like this:

HANDLERS = {
  str: HandleStr,
  int: HandleInt,
  ParentClass: HandleCustomParent,
  ChildClass: HandleCustomChild
  }

def HandleObject(obj):
  for data_type in sorted(HANDLERS.keys(), ???):
    if isinstance(obj, data_type):
      HANDLERS[data_type](obj)

Where ChildClass inherits from ParentClass. The problem is that, since its a dict, the order isn't defined - but how do I introspect type objects to figure out a sort key?

The resulting order should be child classes follow by super classes (most specific types first). E.g. str comes before basestring, and ChildClass comes before ParentClass. If types are unrelated, it doesn't matter where they go relative to each other.

+1  A: 

Do a topological sort with the __bases__ members of each class.

Ignacio Vazquez-Abrams
+4  A: 

If you know you're always dealing with new-style classes:

def numberofancestors(klass):
    return len(klass.mro())

or, if you worry there may be old-style classes in the mix:

import inspect

def numberofancestors(klass):
    return len(inspect.getmro(klass))

and then, in either case,

sorted(HANDLERS, key=numberofancestors, reversed=True)

will give you what you require (you don't need the .keys() part).

@Ignacio's suggestion of a topological sort is theoretically correct, but since, given a class, you can easily and rapidly get the number of its precursors (AKA "ancestors"... in a weird sense of the word whereby you're one of your ancestors;-), with these numberofancestors functions, my approach is much more practical: it relies on the obvious fact that any derived class has at least one more "ancestor" than any of its bases classes, and therefore, with this key=, it will always sort before any of its bases.

Unrelated classes may end up in arbitrary order (just like they might in a topological sort), but you've made it clear you don't care about this.

Edit: the OP, musing in the following comments thread about optimal support for multiple inheritance cases, came up with a drastically different idea than the original one of "pre-sorting" embedded in the question, but his suggestion on how to implement that drastically idea is not optimal:

[h for h in [HANDLERS.get(c) for c in type(obj).mro()] if h is not None][0]

the idea is good (if multiple-inheritance support is of any interest) but the best implementation thereof would probably be (Python 2.6 or better):

next(Handlers[c] for c in type(obj).mro() if c in Handlers)

Normally, adict.get(k) and check for not None is faster than if k in adict: adict[k], but this is not a particularly normal case because to use get requires building a "fake" one-item list and "looping" on it to simulate assignment.

More generally, building a whole list via comprehension just to take its [0]th item is excess work -- the next builtin function called on a genexp acts more like a first, as in, "give me the first item of the genexp" and does no extra work beyond that. It raises StopIteration instead of IndexError if the listcomp/genexp is empty, but that's not normally an issue; you could also have a second argument to next, to be uses as the "default value" if the genexp is empty.

In 2.5 and earlier you'd have to use (thegenexp).next() instead (and there's no way to give it a default argument), but while syntactically a tad less shiny it's more or less equivalent to the 2.6-and-better construct in semantics and speed.

I'm sure glad the discussion continued in the comments because I think this resulting conclusion is worthwhile and potentially useful (though maybe not in the exact environment of the OP's application, where multiple inheritance may not in fact be an issue).

Alex Martelli
Very clever! This won't exactly work for multiple inheritance situations, but I don't plan to handle that use-case.
Richard Levasseur
@Richard, can you please show me a failing case with multiple inheritance? I think I have just proven that there are no failing cases (multiple or not, the derived class will *always* have at least one more ancestor -- itself!) and I'd like to understand where my logic failed me (or "I failed my logic";-) -- but wrack my brain as I may I can't think of _any_ failing case (given that the inheritance graph is always a DAG, of course)... it would seem to counteract the most fundamental basics of graph theory. So thanks in advance for providing a counterexample!
Alex Martelli
I didn't think about it too deeply, but what I had in mind was something like this:<< class A; [a]ncestors=1 // class B(A); a=2 // class One; a=1//class Two(One); a=2 // class Alice(B, One); a=3 // class Bob(A, Two); a=3 // Register(A, HandleA) // Register(One, HandleOne) >>A and One have the same sort order, so their position, relative to each other, is undefined. When we go to match an instance of Alice() or Bob(), we'll randomly get HandleA or HandleOne (since both are subtypes of A and One). Admittedly, I'm hard pressed to think of when this situation would occur.
Richard Levasseur
@Richard, not really: the sorted handlers may have A or One first, so which one you get is arbitrary, but you'll always get the same one (as long as you don't alter HANDLERS dynamically, and I don't see that in your code) -- just as you would if you had done a topological sort of other kinds. Both are equally appropriate anyway, aren't they? What criterion could you use to pick one vs the other otherwise? If you do want to ensure that even with dynamically altered handlers you **always** keep getting the same one, use as key the #ancestor comma some arbitrary unique value, e.g. `id(klass)`.
Alex Martelli
You'll get the same handler during that run of the program, yes, but if you restart it, it can be different. Personally, I'd say they should try to match the MRO, or some other fixed quality about the class's definition.
Richard Levasseur
@Richard, for "why the hey would you ever want it!" repeatability across runs, `klass.__name__` should be the simplest "fixed quality about the class's definition". "Matching the MRO" is not about **sorting** at all, it's about your check `if isinstance(obj, data_type):` -- if that's not what you want then it's the wrong think to check for;-).
Alex Martelli
By "matching the mro" i mean, given a Bob() object, when iterating through HANDLERS, the order of `isinstance(bob, data_type)` checks should match the MRO, so that, if you have complex inheritance hierarchies, you can resuse the existing MRO definition of "order". I think that'd require each `type(obj)` to have its own order in `HANDLERS`. An easier way might be to use `type(m).mro()` as the ordering and look in handlers for matching types: `[h for h in [HANDLERS.get(c) for c in type(obj).mro()] if h is not None][0]` - basically letting `obj` specify the ordering.
Richard Levasseur
Nice idea, and sounder than any predefined ordering, though nowhere as fast as a fixed sort -- +1! -- but it can be made faster (without changing the basic idea deeply), let me edit the answer to show how.
Alex Martelli
A: 

Use collections.OrderedDict from Python 2.7 or 3.1. It's written in pure Python so you could easily plug it into or adapt it to an earlier version if necessary.

An OrderedDict will maintain the order of insertion.

Don O'Donnell