views:

1145

answers:

7

Hi

We just added an autoupdater in our software and got some bug report saying that the autoupdate wouldn't complete properly because the downloaded file's sha1 checksum wasn't matching. We're hosted on Amazon S3...

That's either something wrong with my code or something wrong with S3.

I reread my code for suspicious stuff and wrote a simple script downloading and checking the checksum of the downloaded file, and indeed got a few errors once in while (1 out of 40 yesterday). Today it seems okay.

Did you experience that kind of problem? Is there some kind of workaround ?

extra info: test were ran in Japan.

+4  A: 

Other than the downtime a few weeks ago. None that I heard of.
They did a good job considering the one time it was down was because of an obscure server error that cascaded throughout the cloud. They was very open about it and resolve it as soon as they found out.(it happened during a weekend, iirc)

So they are pretty reliable. My advice is double check your code. And bring it up to amazon support if it is still a problem.

paan
+1  A: 

I agree, quad-checking your code would be a good idea. I'm not saying that it can't happen, but I don't believe that I have ever seen it, and I've used S3 a pretty good bit now. I have, however, mismanaged exceptions/connection breaks a few times and ended up with pieces that didn't match what I was expecting.

I would be pretty surprised if they actually send bad data, but, as always, anything is possible.

jsight
A: 

More than sending bad data, I think I got an ERROR403. If I just try again it's usually ok.

And I agree : I saw a lot of report about people talking about amazon being totally down, but nobody talking about a "sometimes my access is refused" error, so I guess there might be an error on my side. I just set up the log on amazon.

Anyway thank you! I'll follow your advise and stop blaming "the other guy".

poulejapon
A: 

I occasionally get unexpected 404 errors with GETs objects that are part of a preceeding LIST but new to the bucket, and other misc. errors (eg: 403 on my access id and secret key), but nothing catastrophic.

My code runs server side, so I've put in some robust error handling and logging. I think this is a wise thing to do anytime you have one server on the net communicating with another server. :P

Stu Thompson
+2  A: 

Amazon's S3 will occasionally fail with errors during uploads or downloads -- generally "500: Internal Server" errors. The error rate is normally pretty low, but it can spike if the service is under heavy load. The error rate is never 0%, so even at the best of times the occasional request will fail.

Are you checking the HTTP response code in your autoupdater? If not, you should check that your download succeeded (HTTP 200) before you perform a checksum. Ideally, your app should retry failed downloads, because transient errors are an unavoidable "feature" of S3 that clients need to deal with.

It is worth noting that if your clients are getting 500 errors, you will probably not see any evidence of these in the S3 server logs. These errors seem to occur before the request reaches the service's logging component.

James Murty
+1  A: 

Never heard of a problem during download. That's weird. I get TONS of 500 Internal Server Error messages when uploading. That's why I have a daemon that uploads while the user is doing something else.

It doesn't seem to be something in your code, maybe there is really something wrong with S3 (or with S3->Japan.)

You can try firing up an EC2 server, and just run the test from there (traffic won't cost any money, so use as much as you want!) and see if you get errors. If you do, then you're out of luck and S3 isn't for you :)

Good luck!

gilm
+1  A: 

ok, this is all a bit old now, but for reference. I've just been running data migration of several gigs of data from an EC2 server directly into s3. I'm getting 500 errors about every 10 minutes or so, representing an error rate of about 1% of PUTs. So, yes, S3 does have a problem with 500 errors.

Haven't done much in the way of GET's though, so cant comment