views:

163

answers:

2

I run about 5 different hosted servers, through a variety of hosting providers. In the past two months, one of the servers I work on has been down twice. Both times were unexpected and fairly long (36 hours and 4 hours). The server in question is a VPS not a shared server. Given my experience with my other servers/providers (both VPS and shared) this seems like an unacceptable amount of downtime.

  • What do you think?
  • What do you consider a reasonable amount of downtime for your servers (planned and unplanned)?
+2  A: 

You get what you pay for.

What's your SLA with your provider? Do you even have one? If there's any one factor that explains the difference in price, it's this. If you need guaranteed uptime (3 9s for example) then you'll have to pay for it. 5 9s uptime, for example, will cost you considerably more.

To answer your question: did you get an explanation as to the causes of this downtime? 36 hours is excessively long. 4 hours not necessarily (if its rare). Was it a hardware fault? If so, you can't do much about those. I once had a provider who would occasionally stuff up their config and mail would stop working until I told them to fix it. To me, that was unacceptable.

cletus
Yep, they have an SLA, it doesn't give a specific 3, 4 or 5 9s. Instead they have a refund policy to refund "5% of the monthly fee for each 30 minutes of downtime". The 36 hour problem was caused by a faulty RAID Controller. The shorter downtime was a network error.
Brian Fisher
Well at least you get something but, depending on your business, downtime can be way more damaging to your business than any refunded fees can recover. Service providers should generally always have spare hardware on hand but again, I guess you get what you pay for.
cletus
A: 

Server hardware will fail. It is only a matter of time. Rather than trying to determine what is reasonable I would ask you another question: What are all of the possible ways that your configuration could fail and are you prepared to change your setup to account for these possibilities?

For example, let's say that your website is hosted on a single VPS. A few examples of failures might be:

  1. The VPS could become corrupt
  2. The hypervisor could fail
  3. Network equipment in the cabinet could die
  4. Power/heat problems could exist in the data center
  5. Backbone internet connectivity could drop.

You could lower your risk of #1 and #2 taking down your site by deploying a load balancer and a second VPS. Is this decreased risk worth the additional expense?

This discussion turns into a matter of disaster recovery at some point.

calebgroom