Hi, a little new to the windows workflow stuff so go easy :)
I wish to design a workflow host environment that has high availability - a minimum of 2 WF runtime hosts on separate hardware both pointing to the same persistance or tracking SQL database.
I am looking for a pattern whereby I can asynchronously create new workflow instances based on some external event (i.e. some piece of data is updated in DB by a different application). For each event I need to create exactly one workflow instance and doesn't matter which host that instance is created on. There is also some flexibility regarding the duration of time between the event and when the workflow instance is actually created.
One solution I am considering is having a WCF interface on the WF hosts and placing them behind some sort of load balancer. It would then be up to whatever part of the system that is firing the "event" to make the WCF call.
I'm not really happy with this because if both\all WF hosts are down, or otherwise unavailable, the event could be "lost". Also, I won't be able manage load the way I would like to. I envisage a situation where there may be lots of events in a small period of time, but it's perfectly acceptable to handle those events some time later.
So I reckon I need to persist the events somehow and decouple the event creation from the event handling.
Is putting these events into MSMQ, or a simple event table in SQL Server, and having the WF host just poll the queue periodically a viable solution? Polling seems to be a such a dirty word though...
Would NServiceBus and durable messaging be useful here?
Any insights would be much appreciated.
Addendum
The database will be clustered with shared fiber channel storage. The network will also be redundant. In order for WF runtime instances to have fail-over they must point at a common persistence service, which in this case is a SQL backend. It's high availability, not Total Availabilty :)
MSDN article on WF Reliability and High Availabilty
Also, each instance of the WF runtime must be running exactly the same bits, so upgrading will require taking them all down at the same time. I like the idea of being able to do that, if required, without taking the whole system down.