How can a build system support code generation but force declaration of all external dependencies?

I'm currently working on a build tool where its come up that it would be desirable to support code generation. Currently the tool simply compiles any out of date C and C++ source files it finds in the current folder when it's invoked. There's some support for custom build targets, which could create files, but the tool will ignore them because it only pays attention to the files that existed at startup.

This means it's impossible to use a code generating tool, for example lex/yacc. Adding support wouldn't be too difficult. Say the user creates a 'scripts' folder with executable scripts inside it. The tool could simply invoke all the executable scripts inside, relying on the #! magic at the top of the file to invoke the right interpreter, whether it be python/perl/sh/etc. These scripts could edit the source code or produce more source code however they like. After they'd run, the tool would examine whatever source files existed at that point, instead of at startup. But this potentially conflicts with one of the system's other roles.

The build system also lets you specify what versions of libraries you depend on, and it then handles the proper include flags, linker options, etc. for you. Most importantly, it can immediately tell you if a dependency is missing. You can always know that the dependencies are accurate, because without them the compiler won't be able to find the headers for the libraries you use.

Arbitrary scripts potentially ruin this. By being able to invoke any version of anything that's in your current $PATH, they could make your software dependent on any utility or file that happens to be installed on the current box. Worse, you can't even know what version the script depends on. Does this script rely on a behavior specific to Perl 5.10? Does it rely on a file being located at some absolute path? Even if constrained to a few preset versions of interpreters, any one of those interpreters could do system("myarbitarycmd somesourcefile.c"). I see this problem in autotools projects all the time -- their configure scripts or makefiles depend on tools that aren't specified in their package dependencies.

Without this ability, if you're starting a system from scratch and only have your old source code, you have to try to build, see that something fails, diagnose why it failed, figure out that it was a particular utility was missing, try again, find out you got the wrong version, etc. I'd rather avoid that hell. Ideas to be used alone or combined:

Have the tool know how to query versions for a specific set of interpreters and tools. So it would know that python for example accepts the --version flag and know how to parse the response. Then users would say they depend on python 2.4 and the tool would complain loudly if it found that wasn't the version. Same for lex/yacc.
Don't support any language intepreters, instead add support for specific code generation tools as necessary. For example, specifically add lex/yacc or google protocol buffers support. This has a higher maintenance overhead though, and is bad in a setting where developers may need to quickly use new tools.
Require users to specify the names of tools that they want. To enforce this, make a temporary folder, and populate it with symlinks to those tools, then when executing the scripts set $PATH to only be that temporary folder, so only those utilities that they specified are actually invokable. This could be trivially worked around by users writing scripts that invoke utilities by full paths.
When running the scripts, chroot to the source folder. This prevents the use of files or utilities with absolute paths. A determined person could still work around it (chroot jails are notoriously insecure) but I think that would require enough effort on the part of the developer that they would have to know they're doing something wrong and that laziness would be enough of a motivating factor to make them do things The Right Way (TM). However, I suspect this would break most interpreters. How will python/perl find and load their standard libraries? Start a python interpreter without chroot, have it import some standard things users might want like os.path/shutil, then run os.chroot afterwards, then have it run their script? And whatever the equivalent process would be for perl.
In addition to having any empty path except for what users declare, provide a default set automatically of utilities implemented by the build tool itself. Like how CMake has compare_files, etc.

Others? How have people dealt with this for their own projects/organizations?

ansaurus

tags:

views:

answers:

How can a build system support code generation but force declaration of all external dependencies?

related questions