views:

131

answers:

4

(I'm developing in Python 3.1, so if there's some shiny new 3.x feature I should know about for this, please let me know!)

I've got a class (we'll just call it "Packet") that serves as the parent for a bunch of child classes representing each of a few dozen packet types in a legacy client-server protocol over which I have no control. (The packets often have wildly differing behavior, so I just gave them each their own class to make my life easier.)

When receiving a packet, I'll have a simple "dispatch" function that checks the packet header to determine the type, then hands it off to the class that knows how to deal with it.

I do not want to maintain the lookup table by hand -- it's inelegant and just asking for trouble. Instead, I'd like the table built at runtime by examining all of the subclasses of Packet, which will have class variables specifying what packet type they correspond to, e.g.:

class Login(Packet):
    type_id = 0x01

I thought, of course, about iterating through object.__subclasses__(), but I've found slightly varying viewpoints on the safety and propriety of using it for things like this, including implications that it is implementation-specific and may not work in places other than CPython, and that it may disappear from CPython in the future. Am I being too paranoid? Is __subclassess__ considered a "real" part of the language now?

If not, what's a good "pythonic" way to approach this?

+2  A: 

"I'd like the table built at runtime by examining all of the subclasses of Packet,"

This is guaranteed to cause endless problems. This kind of thing puts a strange constraint on your subclasses. You can't use any abstract superclasses to simplify things.

Here's a specific example of what won't work if you "automatically" discover the subclasses.

class MessagePacket( object ):
    """superclass.  Ignore me, I'm abstract."""
class Type_01( MessagePacket ):
    type_id = 0x01
class Type_023( MessagePacket ):
    """superclass with common features for type 2 and 3.  Ignore me, I'm abstract."""
class Type_02( Type_023 ):
    type_id = 0x02
class Type_03( Type_023 ):
    type_id = 0x03
class Type_04( MessagePacket ):
    """superclass for subtypes of 4.  Ignore me, I'm abstract."""
    type_id = 0x04
class Type_04A( Type_04 ):
    discriminator = lambda x: x[23] == 'a' or x[35] == 42
class Type_04B( Type_04 ):
    discriminator = lambda x: True

That should be enough to show that "automatic discovery" is doomed from the outset.

The correct solution is to have a Factory class which embodies the correct subclass hierarchy, exploiting all features based on -- well -- manual design.

class Factory( object ):
    def __init__( self, *subclass_list ):
        self.subclass = dict( (s.type_id,s) for s in subclass_list )
    def parse( self, packet ):
        if packet.type_id == 04:
            # special subclass logic
        return self.subclass[packet.type_id]( packet )

It doesn't seem too onerous a burden to include the following:

factory= Factory( Subclass1, Subclass2, ... SubclassN )

And when you add subclasses, add to the list of subclasses that are actually being used.

S.Lott
+1. But could you provide more examples when examining subclasses can cause problems for educational purposes?
Denis Otkidach
+1  A: 

I guess you can use Python metaclasses (Python ≥ 2.2) to share such an information between classes, that would be quite pythonic. Take a look at the implementation of the Google's Protocol Buffers. Here is the tutorial showing metaclasses at work. By the way, the domain of Protocol Buffers is similar to yours.

Andrey Vlasovskikh
+2  A: 

>>> class PacketMeta(type):
...     def __init__(self,*args,**kw):
...         if self.type_id is not None:
...             self.subclasses[self.type_id]=self
...         return type.__init__(self,*args,**kw)
... 
>>> class Packet(object):
...     __metaclass__=PacketMeta
...     subclasses={}
...     type_id = None
... 
>>> class Login(Packet):
...     type_id = 0x01
... 
>>> class Logout(Packet):
...     type_id = 0x02
... 
>>> 
>>> Packet.subclasses
{1: <class '__main__.Login'>, 2: <class '__main__.Logout'>}
>>> 

If you prefer to use the __subclasses__() you can do something like this

>>> class Packet(object):
...     pass
... 
>>> class Login(Packet):
...     type_id = 0x01
... 
>>> class Logout(Packet):
...     type_id = 0x02
... 
def packetfactory(packet_id):
    for x in Packet.__subclasses__():
        if x.type_id==packet_id:
            return x
... 
>>> packetfactory(0x01)
<class '__main__.Login'>
>>> packetfactory(0x02)
<class '__main__.Logout'>
gnibbler
I very much like the metaclass implementation. I've used that also for a very similar application. A few things I would change:(a) move the creation of the subclasses dictionary to the metaclass.__init__ (i.e. if not hasattr(self, 'subclasses'): self.subclasses = {} ). This keeps the functionality encapsulated.(b) Allow Packet classes to omit type_id if it's not necessary. i.e. instead of "if self.type_id is not None", use "if getattr(self, 'type_id', None) is not None".This has the same effect, but leaves fewer places for implementers to make mistakes.
Jason R. Coombs
+1  A: 

__subclasses__ IS part of the Python language, and implemented by IronPython and Jython as well as CPython (no pypy at hand to test, right now, but I'd be astonished if they have broken that!-). Whatever gave you the impression that __subclasses__ was at all problematic?! I see a comment by @gnibbler in the same vein and I'd like to challenge that: can you post URLs about __subclasses__ not being a crucial part of the Python language?!

Alex Martelli
pypy does indeed have `__subclasses__`
gnibbler
I guess it is just me misunderstanding this http://www.python.org/doc/3.1/library/stdtypes.html#class.__subclasses__
gnibbler
@gnibbler, looks like that to me -- the URL you mention says "Each new-style class keeps a list of weak references to its immediate subclasses. This method returns a list of all those references still alive." (and in Python 3, _only_ new-style classes exist at all, so the docs should be streamlined here of course -- but all other implementations are 2.5 or thereabouts, so that's not germane;-). Looks like a typical part of the Python language to **me**...!
Alex Martelli
The first thing I found was this discussion, which had a very tepid endorsement of __subclasses__: http://groups.google.com/group/comp.lang.python/browse_thread/thread/5c1ccb986c66cdc1/ac8a23acb8854542#ac8a23acb8854542 --- The link it provides to a Tim Peters post is long dead, but I believe it was linking to this, which made me doubly wary: http://groups.google.com/group/comp.lang.python/msg/f46a90ec0bd52786
Nicholas Knight
For the record, __subclasses__ is more than good enough for my purposes (I'm a big fan of non-overengineered self-documenting code), so that's what I'll be wrapping around. Thanks!
Nicholas Knight
@Nicholas, you're welcome! Regarding that thread, beware of trusting assertions made 5-6 years ago -- many become false with time, such as those about `__subclasses__` not being documented (it is now) or being only in one implementation (it's in at least four now). Nevertheless even then Michael Hudson mentioned that _he_ relied on `__subclasses__` in his code -- hardly what I'd call "a tepid endorsement";-).
Alex Martelli