Availability Issue

views:

111

answers:

Availability Issue

Architecture: A bunch of clients send out messages to a server which is behind a VIP. Obviously this server poses an availability risk.

The client monitors a resource and the server is responsible to take action based on the what status the majority of the clients report to it and hence the need for only 1 server/leader.

I am thinking of adding another server as a backup on the VIP, which gets turned on only when the first server fails. However when the backup comes up it would have no information to process and would lose time waiting for clients to report and waiting for the required thresholds etc.

Problem: What is the best and easiest way to have two servers share client state information with only one receiving client traffic?

Solution1: I thought of have having the server forward client state information to backup server and in the event of a failure when the backup server comes up, it can take it from there.

Is there any other way to do this? I thought of having a common/shared place to store state information where both servers can read client state information from. But this doesn't work well as the shared space is a single point of failure too.

One option is to use a write-ahead log. Essentially, any modification you make to your state gets sent over to the backup server, which replays the change on its own copy of the state. As long as it can keep up with the streaming log, the backup is always up-to-date.

This is the approach generally used by most databases; if you use one as your backend, you may be able to get support for this with little work.

Be careful to have a plan to recover from communication failure - either save the log to disk and resend the missing portion, or send a snapshot of the state, plus all log entries since the snapshot on reconnect.

bdonlan 2009-09-10 20:46:40

hmmm... i am looking for a design to keep both servers at sync. I do not care about a write-ahead log as we can have retries in-case of communication failure. Is there a design out there that helps keep the two hosts in sync? The delay in sync is not an issue.

2009-09-10 21:20:02

The write-ahead-log _does_ keep them in sync - you just write your log to the other server. Since the log has all state changes, by replaying it you have the state as it was on the master.

bdonlan 2009-09-10 22:10:56

There are various distributed caching products which do the kind of thing you're talking about here. Some are supplied with App Servers, such as WebSphere's dynacache and Object Grid. In fact ObjectGrid can be used in JSE, no need for an App Server.

Those distributed cache products use various push and pull models with pub-sub messaging to achieve consistency across the instances. Working for IBM I'm a fan of ObjectGrid, but more impartant, I'm fan of not reinventing wheels. My take is that this stuff can get quite complex and hence finding something off-the shelf might save a load of work - there are links to various Open Source solutions here.

djna 2009-09-13 14:28:52

The is very much dependent on how available your solution needs to be (how many 9's). There is a spectrum of solution.

A lightweight one could be crafted around Memcache: extremely fast distributed state facility. As example, it is used extensively on Google AppEngine.

jldupont 2009-09-19 18:49:23

ansaurus

tags:

views:

answers:

Availability Issue

related questions