views:

94

answers:

5

Hi everyone,

Does anyone know of any established best practices for running Windows services (in my case, developed in .NET) such that they will (automatically) fail over correctly to another server, for high availability purposes?

The main ways I can see this being done are either starting up the secondary server when required (in which case there needs to be something monitoring the other server), or having both services running together (in which case they need to synchronize their work so they don't try to do the same things).

Is there a pattern or model for this sort of problem? I know the exact situation will make a big difference, but it does seem like a fairly common issue.

Thanks

John

A: 

Having both running all the time is probably the simplest solution, but you need to ensure that you never go above 50% load, otherwise when one fails, the other will become overloaded and perhaps fail too.

To synchronize, use a transactional database. Trying to write your own synchronization will usually result in bugs.

Mark Byers
A: 

If you can have both services working - it is better. you need to make sure they are stateless or know how to handle state issue, and the Databse will sync between them. In a no single point of failure - you will push the problem to the DB, and there you can have a 2 node active active cluster, and let the DB manufacture handle the sync issues.

Dani
A: 

I believe the best way to deal with failover is at the network level wherever possible. Virtual IPs fronting load-balanced or primary/failover environments is a good way to avoid having to write code for failover scenarios.

In cases where you must handle failover in code:

  1. Test connection/service call
  2. If test fails, send alerts
  3. Fail over to next "registered" service endpoint
Dave Swersky
Whats network got to do with windows services. There can be a background service running and doing its job without necessarily any clients connected to it. The correct way os to do failover clustering.
Pratik
A: 

There are two basic approaches.

  1. clients are aware of different endpoint address and switch as needed or as directed by another service or configuration mechanism. (as an example the stocktrader demo application does this.)

  2. The clients are not aware, and you use a standard network load balancing approach which can also provide failover. F5 is one product. There are many others. It is basically like a NAT for services all requests go through your NLB and and it sends them on to a server, and forwards the response back to the caller. These products monitor the services and only use the ones that are up. Also you can often customize it with rules to have it assign new requests to servers based on server workloads. Windows server has this functionality built-in to some extent.

Either way you do it, it is much much easier if your service calls are "stateless".

DanO
+2  A: 

Here's what has worked for me.

From an infrastructure stand point you will need to have 2 Windows servers that are clustered. (2 standard Windows Server boxes will do, the Clustering piece can be installed and configured, most sys admins should know how to do this.) Next, install your service on both nodes of the cluster and have them both turned OFF and set to MANUAL startup. Next, add a clustered resource to the Windows Cluster Administrator for your service that will manage turning on and off your service on whichever node is active. Let the Windows cluster manage when your service is running and on which node. This is the easy part of clustering your service.

From the service stand point, you will want to design your service so that it can be as stateless as possible. This is kind of lame advice but it really depends on what your service is doing. In the design, just assume that at somepoint during the code's lifetime it will stop at the worst possible time. How will the service on the node2 know where to pickup where node1 left off? That's the hard part that you need to design for. Depending on what your service is doing you can leave the last completed task in a db table or shared data file. You could also have it start from the beginning and double check whether that task has been completed or not before acting upon it.

Again, it is really going to depend on what the service needs to accomplish. Hope this helps.

Walter
Yes failover cluster is the correct approach. You can script this out to configure cluster groups, resources and dependancies during deployment. This requires Windows server Enterprise edition though not standard edition.
Pratik