views:

213

answers:

4

I've got a big Windows legacy application composed by many executables interacting with a database. Executables have 4 main purposes:

(a) parsing and loading a file on a database

(b) transform a file (e.g. encode a file.) Tthis may as well split a file in many parts.

(c) perform some sort of complex updates in the database

(d) produce a file

These executables are called by batch files which can be of three types based on what they do:

(1) Wait for a condition, get a file from some external path, possibly transform it with (b), do (c), send notifications about the result of the activity, write a record to the database

(2) Wait for a condition, do some (c), produce a file, transform it with (b) and then copy it to one or many destinations (local file, database, ftp), send notifications about the result of the activity, write a record to the database.

(3) Coordinate other complex sequences of the (a-b-c-d) executables, send notifications about the result of the activity, write a record to the database

The batch files are ordinary BAT files on a Windows machine (calling possibly each other). The files are launched by a scheduler.

The problems are:

-In each batch file most of the information about the environment (common directories, etc.) is duplicated, as well most of the files of type (1) or (2) are very similar.

-The batch files aren't easily configurable for a test environment, nor they are easily tested automatically

-The code for the notifications and for waiting for a starting condition is partly duplicated es (if errorA call b, if error b call c, if error d call e).

-The use of goto

-You can't keep track exactly of which batch files are obsolete and which are regularly called

-To understand which files are sent or received you need to open individually each batch.

-They impose design constraints on the application, common code can't be abstracted away

-To implement horizontal functionalities (es. have a default logging policy, count the times each job is called) requires writing (unmaintainable?) code on many files.

On the plus side:

  • batch files are easily modified so that if a batch fails for some reason it's not difficult to write a batch to put the situation back in a safe state.

  • they are battle-tested. They've been in production for a long time

I've come up with this solution:

  • write some Java libraries in which I provide common functions and configurations (the functions for notifying people of an event, the functions to transfer files, the functions for waiting for a condition.

  • write a Java "scripting engine framework" command line application which can load and execute task classes. (Task classes can use the functions provided in the libraries described above.)

  • the scripts can be provided by the user in classes as needed in a separate jar.

  • the scripts can be auto documented by means of annotations

  • the scripting engine can ask a script to output a description of itself and its parameters

  • separate the production of a file from the copying


I've decided in favour of Java, instead of a scripting language (PHP, Python) so that I can have the benefit of compilation and do not have to fear to write something in the common libraries which can trigger "a fatal error" and stop execution of the scripts.

I feel that this will simplyfy the execution of the most standard batch files (type 1 and 2), on the other hand I fear that it will be difficult to represent batches of type 3. As well I will put in place an automated deploy and my worry is that while writing a bad batch file can create problems, deploying some buggy code in the engine itself (even if maybe that's less likely to happen because of compilation and testing) or in the configuration can create bigger concerns affecting all the scripts.

Note that the system should be 100% reliable and that there are deadlines and "time windows" for the system tasks to be executed.

My questions are:

Do you think batches files are OK? Are they widely used in this context?

Do you think I'm going in the right direction? Is there something that I haven't considered?<

Has anyone got a better idea?

Do you know of any framework which can help me?

Do you think it's worth it to develop this system to have more flexibility in the future, or I'd better keep the batches?

Note that any new solution to be developed will be integrated initially with the existing system, so I can't for example rewrite everything to work in an application server.

A: 

If I would replace batch files, I would do it using some .NET language. .NET has libraries included for threading, starting an application and processing the output easily. And you can of course use C# for example as an scripting language, or even PowerShell or Python.

Dykam
+3  A: 

I recommend using PowerShell.

In my opinion it would be the most unobtrusive way to move forward. PowerShell supports all the shell commands you have been using plus you can start creating PowerShell functions and get some code reuse and reduce duplication of logic. Also Powershell scripts are easily modified, just like your batch files and with the latest version on Window 7 or Windows 2008 you get an ISE (Integrated Scripting Environment) and you can step through your code and interactively debug it.

If you need more advanced functionality you can use the .NET framework to author sophisticated CmdLets (commandlets) and use them as functions within your Powershell script code.

Another benefit is you don't have to do finnicky string manipulation you would with other scripting environments because they have an object pipeline which allows you to compose complex commands so you can use the output of one command as input of another command.

I use PowerShell on a daily basis to install and uninstall .NET assemblies, deploy and undeploy Microsoft BizTalk entities, run database scripts and create, configure and delete web sites.

Happy scripting.

An Excellent Powershell blog
http://pshscripts.blogspot.com/

Wikipedia page
http://en.wikipedia.org/wiki/Windows_PowerShell

Matthew
I've looked at the PowerShell, it's very interesting, but as well very complicated. The pro: - It surely integrates well with windows!- Its very flexibleThe only concerns I see are:- I currently don't have visual studio c#, I can't buy it, but I would need to develop many cmdlets.- The docs on msdn are not so clear. Version 2 is as well in beta stage.- in the web there are not a lot of examples of systems beyond individual scripts. Availability of free libraries for .net ?
mic.sca
By the way, thank you for reading such a long question! :-)
mic.sca
+1  A: 

I've worked on an application that's quite similar, 100+ .cmd files calling OSQL. We have had many of the same problems you do, as well as the issue of poor error handling. In my opinion it's a better option to just get a scheduler that will do what you want than write a framework to go between your scheduler and batch/script files. It seems to me that this type of system calls for an old fashioned Job Scheduler. It calls your executables order, handles errors, and passes your configured arguments. There are many different options ranging from Zena to JobScheduler. I suggest picking one with a good GUI.

C. Ross
Yes, something that I didn't consider was to move these kind of functionalities in a scheduler.I would need to find a scheduler that is both free and extensible. (es. JobScheduler is cool, I'm trying it out) I fear I need to customize too many things? For example I should be able to provide the current Julian day as a Job parameter, or to wait for smth on a db. I don't have experience with schedulers, do you think it would need some monitoring work by the sysadmin once it is in production? To date we have a very basic scheduler that can only launch jobs at given times.
mic.sca
Many schedulers are pretty powerful (allowing you to define your own calenders for Fiscal Dates etc). But yes, things require an operator/admin overwatch. I find everything else does too, and here we have designated operators for such things, but I realize your situation may be different. Who watches your production system now?
C. Ross
The short answer is...nobody...we have this simple scheduler but as I said it's very simple (it just launches jobs at given times) and doesn't need much supervision ,the control flow and error reporting (including calling some person if needed) is delegated to the single jobs. I guess a scheduler would do, but I fear possible resistances to change. Anyway I'm going to try it and show it to my colleagues, thanks. Your post made me realize that error handling/reporting an control flow should be part of the scheduling and not of the jobs themselves.
mic.sca
Glad I can help. It's something that does take convincing, but on our side we had years of manual intervention to use as evidence.
C. Ross
A: 

You could also try out Quartz (http://www.opensymphony.com/quartz/). It is an open source, free scheduling tool built in Java that supports many different types of tasks and can utilize a variety of databases. So far I've used it with DB2. We have been using it in production for over 2 years and it works like a charm. We did have to implement the java service wrapper program so that it runs as a service, but that was quite easy to accomplish.

jwmajors81