tags:

views:

160

answers:

5

I often find a need to put paths in my code in order to find data or in some cases tool-specific modules. I've so far always used autotools because of this--it's just so easy to call sed to replace a few strings at build time. However, I'd like to find a more Pythonic way of doing this, i.e. use distutils or some other blessed way of building/installing. I've never managed to find anything relating to this in distutils documentation though so how do other people solve this problem?

+1  A: 

For modules paths, a common practice is putting them in .pth files, as documented here. The site module provides a space for Site-specific configuration hooks, you can use it to tailor your environment.

gimel
I always wondered what those .pth files were for.
Ali A
A: 

Well, with distutils (in the standard library) you have "package data". This is data that lives inside the package itself. Explained here how to do it. This is clearly not ideal, as you will have to use some kind of __file__ hacks to look up the location of the data at runtime.

So then comes setuptools (not in the standard library), which additionally has ways of looking up the location of that data at runtime. Explained here how to do it. But again that has it's own set of problems, for example, it may have trouble finding the data files on an uninstalled raw package.

There are also additional third party tools. The one I have used is kiwi.environ. It offers data directories, and runtime lookup, but I wouldn't recommend it for general use, as it is geared towards PyGTK development and Glade file location.

I would imagine there are other third party tools around, and others will elaborate.

Ali A
I don't think __file__ is that bad. It's a decent way of finding where the modules are, certainly better than the above idea of patching hard-coded path strings in the scripts.
bobince
I agree 100%, but the egg-brigade would disagree. __file__ is meaningless in an egg.
Ali A
A: 

"I often find a need to put paths in my code" -- this isn't very Pythonic to begin with.

Ideally, your code lives in some place like site-packages and that's the end of that.

Often, we have an installed "application" that uses a fairly fixed set of directories for working files. In linux, we get this information from environment variables and configuration files owned by the specific userid that's running the application.

I don't think that you should be putting paths in your code. I think there's a better way.

[I just wrote our app installation tool, which does create all the config files for a fairly complex app. I used the Mako templates tool to generate all four files from templates.]

S.Lott
It may not be very pythonic, but how do I find data that is used by my application written in python? I can't find any way of doing that in distutils (though setuptools seems to have that). Using a templating language is an amazing overkill; I only want to make sure that I find a few paths!
Magnus
@Magnus: you shouldn't have any paths anywhere in your application. They should be either command-line options, configuration file options or environment variables.
S.Lott
A: 

The OP here, I've not finally managed to log in using my OpenID.

@S.Lott

Point well taken, but for some Linux distros it seems to be standard to install application-specific data and application-specific modules in specific locations. I think that making these locations configurable at build/install time is a nice thing to do for people packaging my application. AFAICS “the pythonic way” in this case would force these packagers to apply patches to my code.

I'm also in the habit of writing applications where the executable part is a tiny wrapper around a main function in an application-specific module. To me it doesn't seem right to stick this application-specific module in /usr/lib/python2.5/site-packages.

Magnus
+1  A: 

Currently, the best way to bundle data with code is going the setuptools way and use pkg_resources:

from pkg_resources import resource_filename, resource_stream
stream = resource_stream("PACKAGE", "path/to/data_f.ile")

This has the advantage of also working with Python eggs. It has the (IMHO) disadvantage that you need to put your data files into your code directory, which is accepted practice (one of the very, very few practices I disagree with).

As for Linux distros, I can (reasonably) assure you that your program will run without any problems (and patches) on any modern Debian-derived system if you use pkg_resources. I don't know about Fedora/openSUSE, but I would assume that it works as well.

It works on Windows, but it does currently not work with py2exe - there are simple workarounds for that, however.

Torsten Marek