views:

136

answers:

4

Hi,

I know this must be a trivial question, but I've tried many different ways, and searched quie a bit for a solution, but how do I create and reference subfunctions in the current module?

For example, I am writing a program to parse through a text file, and for each of the 300 different names in it, I want to assign to a category.

There are 300 of these, and I have a list of these structured to create a dict, so of the form lookup[key]=value (bonus question; any more efficient or sensible way to do this than a massive dict?).

I would like to keep all of this in the same module, but with the functions (dict initialisation, etc) at the end of the file, so I dont have to scroll down 300 lines to see the code, i.e. as laid out as in the example below.

When I run it as below, I get the error 'initlookups is not defined'. When I structure is so that it is initialisation, then function definition, then function use, no problem.

I'm sure there must be an obvious way to initialise the functions and associated dict without keeping the code inline, but have tried quite a few so far without success. I can put it in an external module and import this, but would prefer not to for simplicity.

What should I be doing in terms of module structure? Is there any better way than using a dict to store this lookup table (It is 300 unique text keys mapping on to approx 10 categories?

Thanks,

Brendan


import ..... (initialisation code,etc )

initLookups()          # **Should create the dict - How should this be referenced?**
print getlookup(KEY)   # **How should this be referenced?**


def initLookups():
    global lookup
    lookup={}
    lookup["A"]="AA"
    lookup["B"]="BB"
    (etc etc etc....)


def getlookup(value)
    if name in lookup.keys():
        getlookup=lookup[name]
    else:
        getlookup=""

    return getlookup
+5  A: 

A function needs to be defined before it can be called. If you want to have the code that needs to be executed at the top of the file, just define a main function and call it from the bottom:

import sys

def main(args):
    pass

# All your other function definitions here

if __name__ == '__main__':
    exit(main(sys.argv[1:]))

This way, whatever you reference in main will have been parsed and is hence known already. The reason for testing __name__ is that in this way the main method will only be run when the script is executed directly, not when it is imported by another file.


Side note: a dict with 300 keys is by no means massive, but you may want to either move the code that fills the dict to a separate module, or (perhaps more fancy) store the key/value pairs in a format like JSON and load it when the program starts.

Stephan202
Thanks for your help. I thought there might be something C-like (and its a long time since Ive used that), where the function declarations can be listed initially, with the code elsewhere/anywhere in file. I will probably leave the lookup functions in a separate module, it's a bit more sensible that way. Thanks
Brendan
You're welcome. Indeed in this respect Python is very different from C.
Stephan202
A: 

If your lookup dict is unchanging, the simplest way is to just make it a module scope variable. ie:

lookup = {
    'A' : 'AA',
    'B' : 'BB',
    ...
}

If you may need to make changes, and later re-initialise it, you can do this in an initialisation function:

def initLookups():
    global lookup
    lookup = {
        'A' : 'AA',
        'B' : 'BB',
        ...
    }

(Alternatively, lookup.update({'A':'AA', ...}) to change the dict in-place, affecting all callers with access to the old binding.)

However, if you've got these lookups in some standard format, it may be simpler simply to load it from a file and create the dictionary from that.

You can arrange your functions as you wish. The only rule about ordering is that the accessed variables must exist at the time the function is called - it's fine if the function has references to variables in the body that don't exist yet, so long as nothing actually tries to use that function. ie:

def foo():
    print greeting, "World"  # Note that greeting is not yet defined when foo() is created

greeting = "Hello"

foo() # Prints "Hello World"

But:

def foo():
    print greeting, "World"

foo()              # Gives an error - greeting not yet defined.
greeting = "Hello"

One further thing to note: your getlookup function is very inefficient. Using "if name in lookup.keys()" is actually getting a list of the keys from the dict, and then iterating over this list to find the item. This loses all the performance benefit the dict gives. Instead, "if name in lookup" would avoid this, or even better, use the fact that .get can be given a default to return if the key is not in the dictionary:

def getlookup(name)
    return lookup.get(name, "")
Brian
Thanks for that point about get, I missed that in the dict documentation. Is return lookup.get(name, name)valid? i.e. returns the value if there is no corresponding lookup?
Brendan
Yes, the default can be whatever value you want - using name will result in 'name="X"; lookup.get(name,name)' returning "X" if "X" is not in the dict.
Brian
+1  A: 

Here's a more pythonic ways to do this. There aren't a lot of choices, BTW.

A function must be defined before it can be used. Period.

However, you don't have to strictly order all functions for the compiler's benefit. You merely have to put your execution of the functions last.

import # (initialisation code,etc )

def initLookups(): # Definitions must come before actual use
    lookup={}
    lookup["A"]="AA"
    lookup["B"]="BB"
    (etc etc etc....)
    return lookup

# Any functions initLookups uses, can be define here.
# As long as they're findable in the same module.

if __name__ == "__main__": # Use comes last
    lookup= initLookups() 
    print lookup.get("Key","")

Note that you don't need the getlookup function, it's a built-in feature of a dict, named get.

Also, "initialisation code" is suspicious. An import should not "do" anything. It should define functions and classes, but not actually provide any executable code. In the long run, executable code that is processed by an import can become a maintenance nightmare.

The most notable exception is a module-level Singleton object that gets created by default. Even then, be sure that the mystery object which makes a module work is clearly identified in the documentation.

S.Lott
A: 

I think that keeping the names in a flat text file, and loading them at runtime would be a good alternative. I try to stick to the lowest level of complexity possible with my data, starting with plain text and working up to a RDMS (I lifted this idea from The Pragmatic Programmer).

Dictionaries are very efficient in python. It's essentially what the whole language is built on. 300 items is well within the bounds of sane dict usage.

names.txt:

A = AAA
B = BBB
C = CCC

getname.py:

import sys

FILENAME = "names.txt"

def main(key):
    pairs = (line.split("=") for line in open(FILENAME))
    names = dict((x.strip(), y.strip()) for x,y in pairs)
    return names.get(key, "Not found")

if __name__ == "__main__":
    print main(sys.argv[-1])

If you really want to keep it all in one module for some reason, you could just stick a string at the top of the module. I think that a big swath of text is less distracting than a huge mess of dict initialization code (and easier to edit later):

import sys

LINES = """
A = AAA
B = BBB
C = CCC
D = DDD
E = EEE""".strip().splitlines()

PAIRS = (line.split("=") for line in LINES)
NAMES = dict((x.strip(), y.strip()) for x,y in PAIRS)

def main(key):
    return NAMES.get(key, "Not found")

if __name__ == "__main__":
    print main(sys.argv[-1])
Ryan Ginstrom
Cool. Thanks Ryan, that's a much more elegant way of setting up the dict. Especially as initially the lookup table was in Excel, so I was using Excel text functions to generate the dict code:) Thanks...
Brendan