views:

750

answers:

7

OK, this is driving me absolutely crazy. I've been all over google and SO looking for someone who has asked this question, but am coming up completely empty. I'll apologize in advance for the lengthy round-about way of asking the question. (If I was able to figure out how to encapsulate the problem, maybe I would have been successful in finding an answer.)

How are large projects managed in Mercurial, when the act of building / compiling generates hundreds of temporary files in order to create the end result?? Is .hgignore the only answer?

Example Scenario:

You have a project that wants to use some open source package for some feature, and needs to compile from source. Okay fine. So you go get the package. un-.tgz it and then slap it into its own Mercurial repository so you can then start tracking changes. Then you make all your changes, and run a build.

You test your end result, are happy with the results and are ready to commit back to your local clone of the repository. So you do an hg status to check your changes prior to committing because you are a good little programmer and want to remind yourself what you changed so you can write a useful changeset message. (okay, stop laughing. ;-P ) The hg status results cause you to immediately start using all those words that would make your mother ashamed -- because you now have screens and screens of "build cruft".

For the sake of argument say this package is mysql, or apache, or I don't care what as long as it is something that (a) you don't control and will be changing regularly, (b) leaves a whole lot of cruft behind in a whole lot of places, and (c) there is no guarantee the cruft won't change each time you get a new version from the external source.

So now what??? The particular project causing this angst is going to be worked on by multiple developers in multiple physical locations, and so needs to be as straightforward as possible. If there is too much involved they're not going to do it, and we'll have a bigger problem on our hands. (Sadly, some old dogs are not keen on learning new tricks...)

One proposed solution was that they would just have to commit everything locally before doing a make, so they have a "clean slate" they would then have to clone from to actually do the build in. That got shot down as (a) too many steps, and (b) not wanting to cruft up the history with a bunch of "time to build now" changesets.

Someone else has proposed that all the cruft just be committed into the Mercurial repository. I am strongly against that because then the next time around those files will turn up as "modified" and therefore be included in the changeset's file list.

We can't possibly be the only people who have run into this problem. So what is the "right" solution? Is our only recourse to try create a massively intelligent .hginore file? This makes me uneasy, because if I tell Mercurial to "ignore everything in this directory I haven't already told you about", then what happens if the next applied patch adds files into that ignored directory? (Mercurial will never see that new file, right?)

Hopefully this is not a completely stupid question with an obvious answer. I've compiled things from source many times before, but have never needed to apply version control on top of that. Plus we're new to Mercurial.

Please help!

Thank you.

+2  A: 

The best solution would be to fix the build process so that it behaves in a 'nice' manner.. namely allowing you to specify some separate directory to store intermediate files in (that could then be completely ignored via a very simple .hgignore entry... or not even within the version-controlled directory structure at all.

Amber
See my comment below to Will. We don't maintain the build process of the open source software causing the problem. So while we technically *could* "fix" the problem, it's really not ideal to have to be making (and maintaining) such modifications to things like apache or php... :-/
JNeefer
A: 

For what it's worth, I've found that in this situation a smart .hgignore is the only solution that has worked for me so far. With the inclusion of regular expression support, it's very powerful, but tricky, too, since a pattern that is cruft in one directory may well be source in another.

At least you can check in the .hgignore and share it with your developers. That way the work is only done once.

[Edit] At least, however, it's possible -- as noted above by Martin Geisler -- to have full path specifications in your .hgignore file; you can, therefore, have test/Makefile in the .hgignore and still have Mercurial notice a new test2/Makefile

His process for creating the file should give you almost what you want, and you can tune it from there.

Chris R
That's exactly the problem (patterns to match cruft in one dir would match things that are source in another dir). But on top of that "simple" problem, is the sheer *quantity* of the cruft. Running "hg status | wc -l" (on the dir that has made no changes and done one build) shows over 7500 individual pieces of cruft spread as far as 4 levels deep. I need to find some way to keep this cruft from getting checked-in, while not losing new things / changes to come with package updates. Arg!
JNeefer
A: 

One option you have is to clean your working directory after verifying a build.

make clean
hg status

Of course you may not want to clean your project if it takes more than a few minutes to build.

Will Bickford
The problem isn't with our own software. It builds nicely and isolates any cruft. The problem is with packages originating outside of us. As soon as I the maintainers of things like apache and php to make their 'make clean' actually 'clean' then 75% of my problem will be solved. But since they are generating the cruft, I'm stuck. (And yes, the project takes more than 'a few' minutes to build.
JNeefer
A: 

If the files you want to track are already known to hg, you can hgignore everything. Then you need to use hg import to add patch, and not just use the patch command (since hg needs to be aware if some new files should be tracked).

tonfa
Hrm... That sounds promising! I will go read up on Mercurial's 'import' versus 'patch' and see if that is a workable solution for this situation. I will post a follow up when I know more. Thank you!
JNeefer
OK I am now on the track of Mercurial Queues, is that what you had in mind? The first page of Ch.12 of the O'Reilly book on Mercurial, it describes part of my issue: "You have an 'upstream' source tree that you can't change; you need to make some local changes on top of the upstream tree; and you'd like to be able to keep those changes separate, so that you can apply them to newer versions of the upstream source." So is the thought that by using this "different" way to manage the files/changes (hg import?), the cruft can stay in the working dir by having the WHOLE working dir .hgignore'd?
JNeefer
+7  A: 

I think the best solution is to tell Mercurial to ignore specific types of files, not entire directories. I just tried compiling Apache but it required APR, so I tested with that instead.

After checking in a clean apr-1.3.8.tar.bz2 I did ./configure; make and looked at the output. The first few pattens were easy:

syntax: glob

*~
*.o
*.lo
*.la
*.so
.libs/*

The remaining new files look like they are specific files generated by the build process. It's easy to add them too:

% hg status --unknown --no-status >> .hgignore

That also added .hgignore since I hadn't yet scheduled it for addition. Removing that I ended up with this .hgignore file:

syntax: glob

*~
*.o
*.lo
*.la
*.so
.libs/*
.make.dirs
Makefile
apr-1-config
apr-config.out
apr.exp
apr.pc
build/apr_rules.mk
build/apr_rules.out
build/pkg/pkginfo
config.log
config.nice
config.status
export_vars.c
exports.c
include/apr.h
include/arch/unix/apr_private.h
libtool
test/Makefile
test/internal/Makefile

I consider this a quite robust way to go about this in Mercurial or any other revision control system for that matter.

Martin Geisler
Martin, this is exactly the thing. I didn't think that full path specs work (and the .hgignore doc is a bit vague on the subject) but it clearly does at least as of Hg 1.3.1.JNeefer, this is your solution!
Chris R
Chris R: yeah, the `.hgignore` documentation is a bit thin. I think the crucial part is this sentence: "For example, say we have an an untracked file, `file.c`, at `a/b/file.c` inside our repository. Mercurial will ignore `file.c` if any pattern in `.hgignore` matches `a/b/file.c`, `a/b` or `a`." (from http://www.selenic.com/mercurial/hgignore.5.html)
Martin Geisler
Thanks for doing the detailed work and showing the results here - that's an excellent answer that deserves a check mark, imho.
Tex
A: 

How about a shell (or whatever) script that walks your build directory recursively, finds every file created after your build process started running, and moves all these files (of course, you can specify the exceptions) into a cruft_dir subdirectory. Then you can just put cruft_dir/* in .hgignore.

EDIT: I forgot to add, but this is fairly obvious, that this shell script runs automatically as soon as your build finishes. Maybe it's even called as the last command in your Makefile/ant/whatever file.

Yawar