If I simply point to both in my DNS record, wouldn't 50% of traffic be in trouble as well?
views:
255answers:
3You can point to both in your DNS. When one of them goes down, the user's browser will make a request, notice something is wrong and request a new A Record. The second A Record should be valid and working still.
There's any number of ways to look at this. When you start looking at these types of things you'll likely see terminology like "Active-Active" or "Active-Passive" setup.
What you're describing is an "Active-Active" setup where both the primary and "failover" hardware are serving customer's needs. Active-Passive generally means there's some manual portion to failover from the "active" server to the standby "passive" server.
Either configuration is perfectly legitimate and the correct answer to your question really comes down to an analysis of your individual situation. Things to consider: 1. Are both pieces of hardware equivalent? (IE: will your customers have the same experience regardless of which path they take through your infrastructure).
Is it worth the cost of running both instances all the time?
What is the impact of down time during failover? (IE: How long will you be down? How much would such an outage cost you).
Are the elements of your maintenance process (upgrades, backups) where active-active is helpful? (IE: upgrading software on the servers behind the load balancer, upgrading the load balancer itself, etc). These are scenarios where you would pull one member out of the pair, do the upgrade work, and re-route traffic to the upgraded member while you upgrade the second member.
In general Active-Active is more costly but gives you the least risk of impact to your customers. So the "right" answer has more to do with business aspects than technical ones.
Now if all things aren't equal you might want to look at things like:
Pre-existing (not your application) load on each of the balancers.
Hardware/capacity on each.
Age (is one close to end of life, or likely to fail soon?)
Location (if you're talking multi-datacenter... geo-location might impact performance).
I know I didn't give you a "DO this" answer... but hopefully I've provided some insight into the types of considerations you'd make when arriving at your answer.
There are easy ways of providing resilient services using one IP address that needn't cost the earth.
For example, you can just configure the public IP address onto a loopback interface onto each of the proxy servers and then announce it via the OSPF routing protocol (or similar) into your internal routing tables.
If a server dies completely, the route is withdrawn from your interior routing tables and traffic automatically stops flowing to the dead server within 30 seconds.
In most networks this solution costs nothing. The OSPF routing can be done using Quagga if your proxies are running Linux or some other UNIX variant.
Your internal network will need to be able to speak OSPF too, but that comes out of the box with most Cisco or Juniper class hardware. You are planning to run some reasonably OK network gear to support these thousands of sites, aren't you ;-) ?
FWIW, I've used a similar technique in the past to handle fail-over of large scale shared web-hosting from one data center to another.