data cache for python package

I have a python module which generates large data files which I want to cache on disk for future use. The cache is likely to end up some hundreds of MB for a normal user, but save a lot of computation time.

The files aren't distributed with the module, but are generated the first time the code is run with a given set of parameters.

So far I've just been using a single file module myself and putting them in a hardcoded path relative to the module (data/). But I now need to distribute this module in a Python package with distutils and I was wondering if there is a standard way to do that.

I was thinking of something like the compiled cache of scipy.weave - but wondering if there is a more modern supported way of doing it. On *nix platforms I would expect it to go in ~/.something but I'm not sure what the windows equivalent would be. Also this should configurable so that users can point it somewhere else if it's more convenient, or to share the cache dir between users. How should such a config file work? Where should it go?

Or should I just have it as an install option, either through a config file next to setup.py or set by manually editing setup.py, then hard code the directory in the module before installation?

Any pointers greatfully received...

Thanks - so I guess default to a ~/.mycache or ~/_mycache directory. Look for a ~/.my_module or ~/_my_module config file and if present get the directory from that. Then default should work for most people but it's easy to configure. I would prefer to avoid environment variables since I don't think Windows people usually like having to set stuff like that.

thrope 2009-12-09 16:00:55

ansaurus

tags:

views:

answers:

data cache for python package

related questions