views:

83

answers:

2

Current Python Workflow

I have pip, distribute, virtualenv, and virtualenvwrapper installed into my Python 2.7 site-packages (a framework Python install on Mac OS X). In my ~/.bash_profile I have the line

export PIP_DOWNLOAD_CACHE=$HOME/.pip_download_cache

This gives a workflow as follows:

$ mkvirtualenv pip-test
$ pip install nose        # downloaded and installed from PyPi
$ pip install mock        # downloaded and installed from PyPi
$ mkvirtualenv pip-test2
$ pip install nose        # installed from pip's download cache
$ pip install mock        # installed from pip's download cache

Questions

Since I'm not downloading packages that have been previously installed in another virtualenv, this workflow saves time and bandwidth. However, it doesn't save disk space, since each package will be installed into each virtualenv. Therefore, I'm wondering:

  • Question #1 Is there a modification to this workflow that would allow me to conserve disk space by having multiple virtualenvs reference one Python package that is not installed in my Python 2.7 site-packages?

I've tried using add2virtualenv which is part of virtualenvwrapper. While this "adds the specified directories to the Python path for the currently-active virtualenv," it doesn't add any of the executables found in the virtualenv/bin directory. Therefore, the following will fail:

$ mkvirtualenv pip-test3
$ add2virtualenv ~/.virtualenvs/pip-test/lib/python2.7/site-packages/nose/
$ nosetests   # Fails since missing ~/.virtualenvs/pip-test3/bin/nosetests
  • Question #2 Am I missing something about the way add2virtualenv works?
  • Question #1 Rephrased Is there a better method than add2virtualenv that allows multiple virtualenvs to reference one Python package that is not installed in my Python 2.7 site-packages?
  • Question #3 If there is a method to install a shared Python package into multiple virtualenvs, is there a performance penalty that isn't there compared to installing Python packages separately into each virtualenv?
  • Question #4 Should I just give up on conserving disk space and stick with my current workflow?
+2  A: 

Unless you are doing development on an embedded system, I find that chasing disk space in this way is always counter-productive. It took me a long time to reach this realization, because I grew up when a very large hard drive was a few megabytes in size, and RAM was measured in K. But today, unless you are under very special and unusual constraints, the benefit of having your projects be orthogonal (you can delete any directories on your system anywhere outside your project, and have its Python packages still there) seems to always far outweigh the disk-space benefit that, if you're busy developing, you'll never — in my experience — even notice anyway.

So I guess that's the lesson I'm offering from my own experience: you'll never notice the disk space you've lost, but you will notice it if trying to clean up a directory in one place on your disk breaks projects under development somewhere else.

Brandon Craig Rhodes
@Brandon: Good thoughts. Using the pip download cache is well worth it, especially when on a plane without access to PyPi. However, your comments make me think that while the bandwidth savings are worthwhile, trying to find a way to conserve disk space isn't worth it. Thanks again.
Matthew Rankin
@Matthew — yes, bandwidth savings are **very** useful, because that will save you not just network resources but — more importantly — time, which is what, in the end, really limits developers. I often have a directory of eggs checked into every project that I point `pip` at, which both prevents my having to wait for network downloads, and also means that I get exactly the same outcome every time I build the project.
Brandon Craig Rhodes
A: 

You can write Python script, that takes two directories, counts md5sum on each file. And if match occurs, deletes one copy and makes hardlink.

It should by quite easy to write. Only problem is to take care that "compressed" directories are read only.

You can improve this idea by using filesystem with copy-on-write mechanism (like ZFS).

Tomasz Wysocki