views:

346

answers:

5

(I foresaw this problem might happen 3 months ago, and was told to be diligent to avoid it. Yesterday, I was bitten by it, hard, and now that it has cost me real money, I am keen to fix it.)

If I move one of my Python source files into another directory, I need to remember to tell Mercurial that it moved (hg move).

When I deploy the new software to my server with Mercurial, it carefully deletes the old Python file and creates it in the new directory.

However, Mercurial is unaware of the pyc file in the same directory, and leaves it behind. The old pyc is used preferentially over new python file by other modules in the same directory.

What ensues is NOT hilarity.

How can I persuade Mercurial to automatically delete my old pyc file when I move the python file? Is there another better practice? Trying to remember to delete the pyc file from all the Mercurial repositories isn't working.

+6  A: 

You need:

1) A real deployment infrastructure, even if it's just a shell script, which does everything. Cloning/checking out an updated copy from source control is not a deployment strategy.

2) Any deployment system should completely clean the directory structure. My usual preference is that each deployment happens to a new directory named with a date+timestamp, and a symlink (with a name like "current") is updated to point to the new directory. This gives you breadcrumbs on each server should something go wrong.

3) To fix whatever is running the Python code. New .py source files should always take precedence over cached .pyc files. If that is not the behavior you are seeing, it is a bug, and you need to figure out why it is happening.

Nicholas Knight
Python is easy, copy out the .py files (and whatever else you need) and ignore the .pyc files (well, you get the idea).
bigredbob
To point 3, the reason it's not happening is because the .py doesn't exist. Without the .py present there's no time to compare and python doesn't just ignore the pyc that has no .py.
Ry4an
Kindly elaborate on 1). With the obvious exception of the .pyc files discussed here, the "hg pull -u" strategy is working well for me.I should mention that I have exactly one target machine (although I have several repositories for testing). My staging machine and production machine are the same, but using different accounts and config files so they don't stomp on each other. Performance, availability and scaling are not important on this project.
Oddthinking
Performance, availability, and scaling are, to one degree or another, default requirements for any meaningful project, raising questions about what it is you're doing. Nevertheless, raw experience teaches that you shouldn't assume you can predict what stray files may find their way into a deployed tree. In doing so, you engage in a classic engineering error of trying to correct for every possible individual problem, instead of trying to make it impossible for anything other than the "right thing" to occur in the first place. Down the former path lies madness.
Nicholas Knight
Thanks Nicholas. Let me address the first sentence first. Yes, I have performance requirements, but I am over-achieving them, while consuming 3% of CPU and 25% of RAM on a low-end virtual server, so I don't need to optimise. My server is only required to be available for a few hours each day, so there is plenty of time to deploy new versions without impacting the business. It has already scaled to full size, and is bespoke for exactly one customer, so scaling is not an issue. (An unusual project, I agree. Great to work on!)
Oddthinking
I want to make sure I have understood your key point: Pulling the latest files from Mercurial is not a (sufficient) deployment strategy, because it risks having ongoing problems of the exact type I have had: stray files leak into the directory tree affecting the production server in unpredicted ways. Is that right? Thanks again; food for thought.
Oddthinking
+4  A: 

How about using an update hook on the server side? Put this in the repository's .hg directory's hgrc file:

[hooks]
update = find . -name '*.pyc' | xargs rm

That will delete all .pyc files whenever you update on the server. If you're worried about the cost of rebuilding all the .pyc files you could always get just a little more clever in the hook and delete only the .pyc's for which there is no .py, but that's probably overkill.

Ry4an
find . -name '*.pyc' -delete
mkotechno
Thanks. This was the info I needed. I went ahead and overkilled. See my answer to this question.
Oddthinking
+7  A: 
  1. Do not store .pyc files in the repository.
  2. Automatize .pyc delete with: find . -name '*.pyc' -delete
  3. While develop use -B argument in Python.
mkotechno
Also, precompile all the .pyc files on the release. You deploy by tagging the repo, and pushing to production, etc. Compiling .pyc files fits in there. For other languages compiling binaries is the most natural thing under the sun.
Tadeusz A. Kadłubowski
1) I am not storing .pyc files in the repository.2) I like the command. See also http://stackoverflow.com/questions/785519/remove-all-pyc-files-from-a-project for more. I guess I am looking at help "automatizing".3) I looked this up. Don't store byte code for imported modules? Why not?
Oddthinking
A: 

I use the .hgignore file to skip versionning of all my .pyc and .py~ (editor's temp files). For example, this is my version :

# use glob syntax.
syntax: glob

.directory
*.pyc
*~
*.o
*.tgz
*.tbz2
*.gz
*.bz2

Also adding a hook on update to remove them is also a interesting trick if you want to not only ignore noise but remove it from your local workspace area.

edomaur
Thanks, edomaur, but I already have an .hgignore file (actually two - one cross-project and another project-specific.)I am not versioning .pyc files. That isn't the root problem.
Oddthinking
Ok, I was not sure what you were aiming for, but I think I've understand now.
edomaur
+1  A: 

What I have actually done:

1) I am considering Nicholas Knight's suggestion about using a proper deployment strategy. I have been reading about Buildout and Collective.hostout to learn more. I need to decide whether such heavy-weight strategies are worthwhile for my project's relatively simple requirements.

2) I have adopted Ry4an's update hook concept, in the short-term, until I decide.

3) I ignored Ry4an's warning about overkill, and wrote a Python script to only delete stray .pyc files.

#!/usr/bin/env python
""" Searches subdirectories of the current directory looking for .pyc files which
    do not have matching .py files, and deletes them.

    This is useful as a hook for version control when Python files are moved.
    It is dangerous for projects that deliberately include Python 
    binaries without source.
"""
import os
import os.path
for root, dirs, files in os.walk("."):
    pyc_files = filter(lambda filename: filename.endswith(".pyc"), files)
    py_files = set(filter(lambda filename: filename.endswith(".py"), files))
    excess_pyc_files = filter(lambda pyc_filename: pyc_filename[:-1] not in py_files, pyc_files)
    for excess_pyc_file in excess_pyc_files:
        full_path = os.path.join(root, excess_pyc_file)
        print "Removing old PYC file:", full_path
        os.remove(full_path)

My update hooks now call this rather than the "find" commands suggested by others.

Oddthinking
That'll work. You could have the udpate hook invoke a subroutine your .py file directly if you'd like. You just name it hook and provide the path to the .py file. That'll save you spinning up a whole 'nother python VM.
Ry4an
Thanks, Ry4an. Wish I could give you more rep for that suggestion.
Oddthinking