views:

356

answers:

3

I've got some data that I want to save on Amazon S3. Some of this data is encrypted and some is compressed. Should I be worried about single bit flips? I know of the MD5 hash header that can be added. This (from my experience) will prevent flips in the most unreliable portion of the deal (network communication), however I'm still wondering if I need to guard against flips on disk?

+8  A: 

I'm almost certain the answer is "no", but if you want to be extra paranoid you can precalculate the MD5 hash before uploading, compare that to the MD5 hash you get after upload, then when downloading calculate the MD5 hash of the downloaded data and compare it to your stored hash.

I'm not sure exactly what risk you're concerned about. At some point you have to defer the risk to somebody else. Does "corrupted data" fall under Amazon's Service Level Agreement? Presumably they know what the file hash is supposed to be, and if the hash of the data they're giving you doesn't match, then it's clearly their problem.

I suppose there are other approaches too:

  • Store your data with an FEC so that you can detect and correct N bit errors up to your choice of N.
  • Store your data more than once in Amazon S3, perhaps across their US and European data centers (I think there's a new one in Singapore coming online soon too), with RAID-like redundancy so you can recover your data if some number of sources disappear or become corrupted.

It really depends on just how valuable the data you're storing is to you, and how much risk you're willing to accept.

Greg Hewgill
that would just tell me that a problem occured, I wouldnt have my data
stuck
I've edited my answer to include more ideas for risk mitigation.
Greg Hewgill
it'd be most awesome to know what Amazon is doing, anyone out there from Amazon?
stuck
I'm disappointed that this answer is going to take my bounty points. It doesnt answer the question at all. the question isnt how to guard againt bit flips - it's does Amazon guard againt bit flips.
stuck
@Chris Gray: Not to worry, my answer won't get your bounty points because you started the bounty *after* I posted my answer. However, you will still forfeit the points you put up for bounty. From the faq: "The highest voted answer created after the bounty started with at least 2 upvotes will be automatically accepted."
Greg Hewgill
A: 

I see your question from two points of view, a theoretical and practical.

From a theoretical point of view, yes, you should be concerned - and not only about bit flipping, but about several other possible problems. In particular section 11.5 of the customer agreements says that Amazon

MAKE NO REPRESENTATIONS OR WARRANTIES OF ANY KIND, WHETHER EXPRESS, IMPLIED, STATUTORY OR OTHERWISE WITH RESPECT TO THE SERVICE OFFERINGS. (..omiss..) WE AND OUR LICENSORS DO NOT WARRANT THAT THE SERVICE OFFERINGS WILL FUNCTION AS DESCRIBED, WILL BE UNINTERRUPTED OR ERROR FREE, OR FREE OF HARMFUL COMPONENTS, OR THAT THE DATA YOU STORE WITHIN THE SERVICE OFFERINGS WILL BE SECURE OR NOT OTHERWISE LOST OR DAMAGED.

Now, in practice, I'd not be concerned. If your data will be lost, you'll blog about it and (although they might not face any legal action), their business will be pretty much over.

On the other hand, that depends on how much vital your data is. Suppose that you were rolling your own stuff in your own data center(s). How would you plan for disaster recovery there? If you says: I'd just keep two copies in two different racks, just use the same technique with Amazon, maybe keeping two copies in two different datacenters (since you wrote that you are not interested in how to protect against bit flips, I'm providing only a trivial example here)

Davide
A: 

There are two ways of reading your question:

  1. "Is Amazon S3 perfect?"
  2. "How do I handle the case where Amazon S3 is not perfect?"

The answer to (1) is almost certainly "no". They might have lots of protection to get close, but there is still the possibility of failure.

That leaves (2). The fact is that devices fail, sometimes in obvious ways and other times in ways that appear to work but give an incorrect answer. To deal with this, many databases use a per-page CRC to ensure that a page read from disk is the same as the one that was written. This approach is also used in modern filesystems (for example ZFS, which can write multiple copies of a page, each with a CRC to handle raid controller failures. I have seen ZFS correct single bit errors from a disk by reading a second copy; disks are not perfect.)

In general you should have a check to verify that your system is operating is you expect. Using a hash function is a good approach. What approach you take when you detect a failure depends on your requirements. Storing multiple copies is probably the best approach (and certainly the easiest) because you can get protection from site failures, connectivity failures and even vendor failures (by choosing a second vendor) instead of just redundancy in the data itself by using FEC.

janm