views:

132

answers:

2

Hi,

I'm developing a very sensitive application for a client that needs to have 99.9999999999% uptime guarantee.

It's a Rails application with MySQL database. I am thinking of hosting it on EngineYard due the low maintenance requirements and easiness to run.

Heroku does not seems to be the perfect solution due to uptime problems.

EC2 can also be a good solution but maybe it requires too much work to install and maintain.

My question is: how to make a redundant system using EngineYard, Heroku, EC2 or any other Rails hosting that you propose? Do I need to have 2 instances in different places of the world being replicated? Please advise the best way.

Regards.

+1  A: 

EY uses a company called TerraMark for their hosting, which is some pretty serious hosting infrastructure. Out of the 3 you listed, I would go with them.

For up time, you want to look at master/slave replication of your data, automatic failover, and you want to build redundancy wherever you can. High availability is a fairly involved topic, and has more to do with IT then dev, I would recommend asking where to start over at serverfault.com.

Matt Briggs
The EY cloud hosting uses EC2
Beerlington
They have two offerings, one on terramark, the other on ec2.
Matt Briggs
+4  A: 

Everyone wants 100% uptime, but achieving it is pretty much impossible. Since down-time can be caused by any of the links in the chain, and there usually are dozens, to achieve such a high standard you will need to buy gold-plated everything. Essentially, you'll have to spend a fortune. The difference between 99% uptime, which means your site is unavailable for 12 hours a year, and 99.9% uptime, where it's less than an hour is considerable, and from there to 99.99% is even higher, where the tolerance is about five minutes.

Going beyond 99.99% is simply impractical. Nobody will sign a guarantee like this unless they're being dishonest, the agreement so loaded down with caveats as to be unenforcable, or don't mind dishing out heavy credits all the time. Amazon EC2's SLA is 99.99% for instance.

The metrics I've seen collected on a provider like Linode shows uptimes of about 99.97% to 99.99%. Occasionally you will see datacenters with 100% uptime, but this is the network level only and doesn't take into account intermittent internal glitches that may knock your server offline.

Choosing a managed hosting provider like Engine Yard might be the solution for you, because it can minimize your exposure to random events, but it won't get you such a high uptime in and of itself. They're very good at maintaining the system layer, but their ability to fix or work-around bugs in your application is very limited, and they are subject to the same intermittent networking issues with EC2 as anyone else.

There are two kinds of reliability you should be concerning yourself with. One is availability, which is purely a measure of how likely a client is to be able to use the application. The other is data integrity, which is a measure of how likely data is to be retained given any number of disaster scenarios.

Most people will accept that an application might be down every so often for brief periods of time, but people refuse to accept that data may go missing every now and then.

It isn't hard to get a "99.9999999999%" data retention rate, but you will need to plan out your backup, replication, and recovery strategy in detail and will have to exercise your systems regularly to ensure they are working as designed.

Where you have almost no control over the often patchy routing on the internet in general, the defect rate in the hardware of your server, the power in your data center and so on, you do have a huge amount of control over your backup strategy.

tadman
Great comment, thank you. I'll try using EngineYard.
donald