I have a multistep process where each step does some network IO (web service call) and then persists some data. I want to design it in a fault tolerant way so that if the service fails, either because of a system crash or one of the steps fails, I am able to recover and re-start from the last error free step.
Here is how I am thinking of addressing this (this is pretty high level):
- Stored the state of each step (NOT_STARTED, IN_PROGRESS, FAILED) in a database table
- If a step fails mark it and its dependent step as "FAILED" and move to the next non dependent step
- Recover by reading this table (e.g in a bootstrap portion of the application)
I was wondering if there are some design patterns, frameworks and algorithms that address this problem.