ansaurus

Question

Can a webserver determine if its the active node of an HA failover system without hard coding anything on the server itself?

Answer 1

A:

One way is to get the box to export it's idea of whether it is active into your monitoring. From there you can predicate paging/emailing on this status (with a race condition around failover), and alert on none/too many systems believing they are active.

Another option is to monitor the active system via a DNS alias (or some other method to address the active system) and page on that. Then also monitor all the systems, both active and inactive, and email on that. This will cause duplicate alerts for the active system, but that's probably okay.

It's hard to be more specific without knowing more about your setup.

2009-03-30 01:06:15

The box itself should have no knowledge if it active. I don't want to visit all nodes when I fail over. The scenario is simple and pervasive where I work:Box perfoms some critical business function and just in case it has a partner in case something goes wrong. Upgrades occur on inactive.

ojblass 2009-03-30 01:19:09

Try the second method I suggest, presumably a failover includes a DNS update of some form. I take it this is a stateless service such as apache? Any reason not to use active-active?

2009-03-30 01:27:05

Active active is more of a load balancing approach. Updating machines while they are active does not allow you the ability to test and or upgrade without possibly impacting production systems. Some of the paired boxes are part of an active active scenario but that is another story.

ojblass 2009-03-30 01:31:34

We have people that do the failover and I guess speaking to them is the right thing to do. That entry of which box a request would flow to is a function of the network. I still feel the box has to be ignorant of its participation in the cluster.

ojblass 2009-03-30 01:33:58

Answer 2

A:

As a rule, the machines in a HA cluster shouldn't really know which one is active. There's one exception, mind, and that's with cronjobs. At work, we have a HA cluster on top of which some rather important services run. Some of those use services have cronjobs, and we only want them running on the active box. To do that, we use this shell script:

#!/bin/sh
HA_CLUSTER_IP=0.0.0.0
if ip addr | grep $HA_CLUSTER_IP >/dev/null; then
    eval "$@"
fi

(Note that this is running on Debian.) What this does is check to see if the current box is the active one within the cluster (replace 0.0.0.0 with the external IP of your HA cluster), and if so, executes the command passed in as arguments to the script. This ensures that one and only one box is ever actually executing the cronjobs.

Other than that, there's really no reasons I can think of why you'd need to know which box is the active one.

UPDATE: Our HA cluster uses Heartbeat to assign the cluster's external IP address as a secondary address to the active machine in the cluster. Programmatically, you can check to see if your machine is the current active box by calling gethostbyname(), and iterating over the data returned until you either get to the end or you find the cluster's IP in the list.

Keith Gaughan 2009-04-03 22:35:40

alright consider my cron job something that determines if a page or an email should be sent out. If whatever managed the associations decicded to put this box into another cluster I would be visiting this script often... would I not?

ojblass 2009-04-03 23:12:56

Ah. Fair point. What your looking to do is check to is if the external IP of your HA cluster is a secondary address on the machine you're checking on.

Keith Gaughan 2009-04-03 23:31:33

Is a heartbeat some sort of roundtrip mechanism? I think that getbyhostname is still data leakage to the nodes. I will look at heartbeat related stuff because from what I think you are saying is that heartbeat could be a roundtrip operation.

ojblass 2009-04-03 23:58:01

I'd really have to talk to the sysadmin who set those elements of the HA cluster up. I'll probably see him in the pub tomorrow, so if I get a chance, I'll ask him. I've added a link to the Heartbeat homepage, if it helps.

Keith Gaughan 2009-04-04 00:03:00

I was actually thinking of how cool a website could be for you to buy a drink for someone far far away!

ojblass 2009-04-04 00:05:33

Hmmm... beer over IP... :-)

Keith Gaughan 2009-04-04 00:07:04

Answer 3

+2 A:

It really depends on the HA system you're using.

For example, if your system uses a shared IP and the traffic is managed by some hardware box, then it can be hard to determine if a certain box is a master or slave. That will depend on a specific solution really... As long as you can add a custom script to the supervisor, you should be ok - for example the controller can ping a daemon on the master server every second. In the alerting script, simply check if the time of the last ping < 2 sec...

If your system doesn't have a supervisor / controller node, but each node tries to determine the state itself, you can have more problems. If a split brain occurs, you can end up with both slaves or both masters, so your alerting software will be wrong in both cases. Gadgets that can ensure only one live node (STONITH and others) could help.

On the other hand, in the second scenario, if the HA software works on both hosts properly, you should be able to obtain the master/slave information straight from it. It has to know its own state at any time, because it's one of its main functions. In most HA solutions you should be able to either get the current state, or add some code to run when the state changes. Heartbeat offers both.

I wouldn't worry about the edge cases like a split brain though. Almost any situation when you lose connection between the clustered nodes will be more important than the stuff that happens on the separate nodes :)

If the thing you care about is really logging / alerting only, then ideally you could have a separate logger box which gets all the information about the current network / cluster status. External box will probably have better idea how to deal with the situation. If your cluster gets dos'ed / disconnected from the network / loses power, you won't get any alert. A redundant pair of independent monitors can save you from that.

I'm not sure why you mentioned DNS - due to its refresh time it shouldn't be a source of any "real-time" cluster information.

viraptor 2009-04-07 20:39:36

Understanding that corner cases are of little interest to me here is key. Also I am able to host the monitoring of the solution on the hardware managing the cluster itself. The information you provided me led me to a decent solution in under 4 hours. Thank you... enjoy your bounty!

ojblass 2009-04-08 01:36:37

i am still in awe of the simplicity of it.

ojblass 2009-04-10 06:13:18

Answer 4

A:

Without hard-coding.... ? I assume you mean some native heartbeat query, not sure. However, you could use ifconfig, HA creates a virtual interface on whatever interface it is configured to run on. For instance if HA was configured on eth0 then it would create a virtual interface of eth0:0, but only on the active node.

Therefore you could do a simple query of the ifconfig output to determine if the server twas the active node or not, for example if eth0 was the configured interface:

ACTIVE_NODE=`ifconfig | grep -c 'eth0:0'`

That will set the $ACTIVE_NODE variable to 1 (for active) and 0 (if standby). Hope that may help.

http://www.of-networks.co.uk

earthgecko 2009-05-14 23:50:18

ansaurus

tags:

views:

answers:

Can a webserver determine if its the active node of an HA failover system without hard coding anything on the server itself?

related questions