The company I work for has lots of "complicated" file based batch processes, with sequences of files such as:
- take file A
- fetch file B
- join fields in file A to file B to make file C
- run some heuristics on file C to make file D
- upload file D to server X
- build a report based on files D and A and mail it to [email protected]
Each step may take many hours to run (files may contain billions of lines of data). The while thing is glued together with GNU Makefiles, with sections such as:
fileC: fileD run-analysis $^ > $@
The Makefiles are useful for modelling the dependencies between steps, as well as allowing everything after a certain step to be repeated (if there's a problem with a step, or the heuristics are changed and so on).
Using Makefiles always seems bad to me, as they're for building software, not running batch processes. Also, Makefiles don't provide any form of testing framework.
My question is, how do you coordinate large sequences large operations like these?