Hi,
Currently we have a batch driven process at work which runs every 15 mins and everytime it runs it repeats this cycle several times:
- Calls a sproc and get some data back from the DB
- Process the data
- Saves the result back to the DB
It can't load all the data in one go because the data are segregated by a number of fields and each group of data requires different behaviour during processing (configurable from a front end). However, recent changes in the business has resulted in a sudden surge in the volume of data (and therefore the processing time required) for some of the groups, so now whenever one of the groups overruns it delays all the other groups.
Our plan is to parallelise this process across multiple machines so that:
- there is a central controller (master) and several workstations (slaves)
- master is responsible for scheduling the runs (configurable from a front end)
- master (or a separate component) is responsible for loading/saving data to and from the DB (in order to avoid deadlocks/contention between the multiple slaves)
- slaves receive work items, process them and return the results to master
- there is a primary slave (main production server in our environment) which will usually receive all the work items
- secondary slaves will receive work only if the primary slave is working on a group which requires longer processing time (master can identify this based on the size of the data returned or it can be left to configuration)
- if slave throws exception during processing, alert email is sent to support team, and the same work item is picked up during the next schedule cycle
- not sure what to do with timeouts yet
I have done some research on the Master-Slave pattern for distributed environment but so far haven't found many reference material, does anyone here know of a good implementation of such pattern? Any pointers on potential pitfalls of such an architecture would be much appreciated too!
Thanks,