I've got a big Windows legacy application composed by many executables interacting with a database. Executables have 4 main purposes:
(a) parsing and loading a file on a database
(b) transform a file (e.g. encode a file.) Tthis may as well split a file in many parts.
(c) perform some sort of complex updates in the database
(d) produce a file
These executables are called by batch files which can be of three types based on what they do:
(1) Wait for a condition, get a file from some external path, possibly transform it with (b), do (c), send notifications about the result of the activity, write a record to the database
(2) Wait for a condition, do some (c), produce a file, transform it with (b) and then copy it to one or many destinations (local file, database, ftp), send notifications about the result of the activity, write a record to the database.
(3) Coordinate other complex sequences of the (a-b-c-d) executables, send notifications about the result of the activity, write a record to the database
The batch files are ordinary BAT files on a Windows machine (calling possibly each other). The files are launched by a scheduler.
The problems are:
-In each batch file most of the information about the environment (common directories, etc.) is duplicated, as well most of the files of type (1) or (2) are very similar.
-The batch files aren't easily configurable for a test environment, nor they are easily tested automatically
-The code for the notifications and for waiting for a starting condition is partly duplicated es (if errorA call b, if error b call c, if error d call e).
-The use of goto
-You can't keep track exactly of which batch files are obsolete and which are regularly called
-To understand which files are sent or received you need to open individually each batch.
-They impose design constraints on the application, common code can't be abstracted away
-To implement horizontal functionalities (es. have a default logging policy, count the times each job is called) requires writing (unmaintainable?) code on many files.
On the plus side:
batch files are easily modified so that if a batch fails for some reason it's not difficult to write a batch to put the situation back in a safe state.
they are battle-tested. They've been in production for a long time
I've come up with this solution:
write some Java libraries in which I provide common functions and configurations (the functions for notifying people of an event, the functions to transfer files, the functions for waiting for a condition.
write a Java "scripting engine framework" command line application which can load and execute task classes. (Task classes can use the functions provided in the libraries described above.)
the scripts can be provided by the user in classes as needed in a separate jar.
the scripts can be auto documented by means of annotations
the scripting engine can ask a script to output a description of itself and its parameters
separate the production of a file from the copying
I've decided in favour of Java, instead of a scripting language (PHP, Python) so that I can have the benefit of compilation and do not have to fear to write something in the common libraries which can trigger "a fatal error" and stop execution of the scripts.
I feel that this will simplyfy the execution of the most standard batch files (type 1 and 2), on the other hand I fear that it will be difficult to represent batches of type 3. As well I will put in place an automated deploy and my worry is that while writing a bad batch file can create problems, deploying some buggy code in the engine itself (even if maybe that's less likely to happen because of compilation and testing) or in the configuration can create bigger concerns affecting all the scripts.
Note that the system should be 100% reliable and that there are deadlines and "time windows" for the system tasks to be executed.
My questions are:
Do you think batches files are OK? Are they widely used in this context?
Do you think I'm going in the right direction? Is there something that I haven't considered?<
Has anyone got a better idea?
Do you know of any framework which can help me?
Do you think it's worth it to develop this system to have more flexibility in the future, or I'd better keep the batches?
Note that any new solution to be developed will be integrated initially with the existing system, so I can't for example rewrite everything to work in an application server.