views:

607

answers:

2

I'm currently in the middle of porting a fairly large Perl The problem is that it uses little Perl tricks to make its code available for useing. I've done about the same with Python, making the codebase one big module for importing. I've had a firm grasp of Python for a long time, but I have no experience with large projects written in Python that need to access other parts of itself while maintaining an internal state.

I haven't yet tried simply importing the entire thing in one line (import core), but I know I'm currently not doing things in the best of ways. Here's an example from the master script that sets everything in motion:

self.Benchmark = Benchmark(self)

self.Exceptions = Exceptions

self.Settings = Settings(self)
self.Cache = Cache(self)

self.Deal = Deal(self)
self.Utils = Utils(self)
self.FileParsers = FileParsers(self)
self.Network = Network(self)
self.Plugins = Plugins(self)
self.Misc = Misc(self)

It works, but I'm not happy with it. Right now, the master class script imports each piece of the core module and creates an instance of the contained classes, passing itself as an argument to __init__ in those classes. Like so:

class FileParsers:
    def __init__(self, parent):
        self.parent = parent

Now the code in that class can access the entire rest of the codebase through the parent class.

self.parent.Settings.loadSysConfig()

So my question is this: considering the above, what would be the best way to reorganize the project and refactor the code so that it retains its current ability to access everything else? The code is very noncritical, so I'm not that worried about internal data integrity, I just don't like having to go through the parent class in such an ugly way. And those long chains slow the code down as well.

EDIT: Whoops, forgot these: links to the SVN repos for both project. Mine is here, and the project I'm porting is here.

+1  A: 

It's really hard to tell without actually being able to see the code, but you should probably just consider importing the items that each module uses, in that module. It's not unusual to have a long list of imports - here's an example from my own website:

# standard
import inspect
import linecache
import neo_cgi
import neo_cs
import neo_util
import os
import random
import sys
import time
from _apache import SERVER_RETURN
from mod_python import apache
from mod_python import util
from mod_python.util import FieldStorage
from os.path import dirname, isfile, join, splitext

# set up path
pydir = dirname(__file__)
if pydir not in sys.path:
    sys.path.append(pydir)

# things I wrote
import auth
import handlers.accounts, handlers.publish, handlers.standard
import logger
import markup
import programs
import summarize
from auth import check_auth
from common import hdf_iterate, load_hdf_cgi_vars, load_hdf_common_vars
from common import hdf_insert_value, hdf_insert_list, hdf_insert_dict
from handlers import chain, farm, opt
from handlers import URIPrefixFilter
from handlers.standard import TabBarHandler

and I'm sure a lot of larger modules have even longer lists.

In your case, maybe have a Settings module with a singleton object (or with the settings as module properties) and do

import Settings

or whatever.

David Zaslavsky
Whoops! Sorry, I forgot to add the SVN links. I've done so, now.
sli
+1  A: 

what would be the best way to reorganize the project and refactor the code so that it retains its current ability to access everything else?

I think you're actually quite close already, and probably better than many Python projects where they just assume that there is only one instance of the application, and store application-specific values in a module global or singleton.

(This is OK for many simple applications, but really it's nicest to be able to bundle everything up into one Application object that owns all inner classes and methods that need to know the application's state.)

The first thing I would do from the looks of the code above would be to factor out any of those modules and classes that aren't a core competency of your application, things that don't necessarily need access to the application's state. Names like “Utils” and “Misc” sound suspiciously like much of their contents aren't really specific to your app; they could perhaps be refactored out into separate standalone modules, or submodules of your package that only have static functions, stuff not relying on application state.

Next, I would put the main owner Application class in the package's __init__.py rather than a ‘master script’. Then from your run-script or just the interpreter, you can get a complete instance of the application as simply as:

import myapplication

a= myapplication.Application()

You could also consider moving any basic deployment settings from the Settings class into the initialiser:

a= myapplication.Application(basedir= '/opt/myapp', site= 'www.example.com', debug= False)

(If you only have one possible set of settings and every time you instantiate Application() you get the same one, there's little use in having all this ability to encapsulate your whole application; you might as well simply be using module globals.)

What I'm doing with some of my apps is making the owned classes monkey-patch themselves into actual members of the owner application object:

# myapplication/__init__.py

class Application(object):
    def __init__(self, dbfactory, debug):
        # ...
        self.mailer= self.Mailer(self)
        self.webservice= self.Webservice(self)
        # ...

import myapplication.mailer, myapplication.webservice


# myapplication/mailer.py

import myapplication

class Mailer(object):
    def __init__(self, owner):
        self.owner= owner

    def send(self, message, recipients):
        # ...

myapplication.Application.Mailer= Mailer

Then it's possible to extend, change or configure the Application from outside it by replacing/subclassing the inner classes:

import myapplication

class MockApplication(myapplication.Application):
    class Mailer(myapplication.Application.Mailer):
        def send(self, message, recipients):
            self.owner.log('Mail send called (not actually sent)')
            return True

I'm not that worried about internal data integrity

Well no, this is Python not Java: we don't worry too much about Evil Programmers using properties and methods they shouldn't, we just put ‘_’ at the start of the name and let that be a suitable warning to all.

And those long chains slow the code down as well.

Not really noticeably. Readability is the important factor; anything else is premature optimisation.

bobince