views:

1248

answers:

5

Hi Guys,

I am writing a script at the moment that will grab certain information from HTML using dom4j.

Since Python/Jython does not have a native switch statement I decided to use a whole bunch of if statements that call the appropriate method, like below:

if type == 'extractTitle':
    extractTitle(dom)
if type == 'extractMetaTags':
    extractMetaTags(dom)

I will be adding more depending on what information I want to extract from the HTML and thought about taking the dictionary approach which I found elsewhere on this site, example below:

{
    'extractTitle':    extractTitle,
    'extractMetaTags': extractMetaTags
}[type](dom)

I know that each time I run the script the dictionary will be built, but at the same time if I were to use the if statements the script would have to check through all of them until it hits the correct one. What I am really wondering, which one performs better or is generally better practice to use?

Update: @Brian - Thanks for the great reply. I have a question, if any of the extract methods require more than one object, e.g.

handle_extractTag(self, dom, anotherObject)
# Do something

How would you make the appropriate changes to the handle method to implemented this? Hope you know what I mean :)

Cheers

+14  A: 

To avoid specifying the tag and handler in the dict, you could just use a handler class with methods named to match the type. Eg

class  MyHandler(object):
    def handle_extractTitle(self, dom):
        # do something

    def handle_extractMetaTags(self, dom):
        # do something

    def handle(self, type, dom):
        func = getattr(self, 'handle_%s' % type, None)
        if func is None:
            raise Exception("No handler for type %r" % type)
        return func(dom)

Usage:

 handler = MyHandler()
 handler.handle('extractTitle', dom)

Update:

When you have multiple arguments, just change the handle function to take those arguments and pass them through to the function. If you want to make it more generic (so you don't have to change both the handler functions and the handle method when you change the argument signature), you can use the *args and **kwargs syntax to pass through all received arguments. The handle method then becomes:

def handle(self, type, *args, **kwargs):
    func = getattr(self, 'handle_%s' % type, None)
    if func is None:
        raise Exception("No handler for type %r" % type)
    return func(*args, **kwargs)
Brian
of course now the interpreter has to build a class (and thus a dict) and its methods instead of the dict :)not to worry, though: dicts are the foundation of python and thus very optimized
hop
The dispatch dict is part of the class though, which will only get constructed once. You can also avoid constructing an instance by just using the same instance, or use class or static methods directly (though I doubt the speed difference will ever be noticable).
Brian
A very nice example of what dynamic languages can do. Who would want a "switch" if she could get this done like this.Also observes the DRY principle, keeps information in one place, an is easily extensible via sub classing. Kudos
Ber
This is a really important Python design pattern.
Robert Rossney
Cheers Brian, excellent works perfectly!!!!
Eef
+1  A: 

It depends on how many if statements we're talking about; if it's a very small number, then it will be more efficient than using a dictionary.

However, as always, I strongly advice you to do whatever makes your code look cleaner until experience and profiling tell you that a specific block of code needs to be optimized.

Eli Courtwright
I have not written the script yet but if I were to use the if statements I would be writing upwards of 20 if statements.
Eef
+1  A: 

Your use of the dictionary is not quite correct. In your implementation, all methods will be called and all the useless one discarded. What is usually done is more something like:

switch_dict = {'extractTitle': extractTitle, 
               'extractMetaTags': extractMetaTags}
switch_dict[type](dom)

And that way is facter and more extensible if you have a large (or variable) number of items.

PierreBdR
Methods do not get called (i.e "run", "executed", "evaluated"...) just by being listed as values in a dictionary expression. Probably what you meant to point out is that the OP's would rebuild the dictionary once for each call, which *is* wasteful.
bendin
No, the OP changed its post since I answered. In the first version, method actually got called.
PierreBdR
+2  A: 

With your code you're running your functions all get called.

handlers = {
'extractTitle': extractTitle, 
'extractMetaTags': extractMetaTags
}

handlers[type](dom)

Would work like your original if code.

Marcos Lara
Are you sure? That doesn't sound right to me.
rjmunro
+1  A: 

The efficiency question is barely relevant. The dictionary lookup is done with a simple hashing technique, the if-statements have to be evaluated one at a time. Dictionaries tend to be quicker.

I suggest that you actually have polymorphic objects that do extractions from the DOM.

It's not clear how type gets set, but it sure looks like it might be a family of related objects, not a simple string.

class ExtractTitle( object ):
    def process( dom ):
        return something

class ExtractMetaTags( object ):
    def process( dom ):
        return something

Instead of setting type="extractTitle", you'd do this.

type= ExtractTitle() # or ExtractMetaTags() or ExtractWhatever()
type.process( dom )

Then, you wouldn't be building this particular dictionary or if-statement.

S.Lott