views:

363

answers:

3

I fired up a sample application that uses Amazon S3 for image hosting. I managed to coax it into working. The application is hosted at github.com. The application lets you create users with a profile photo. When you upload the photo, the web application stores it on Amazon S3 instead of your local file system. (Very important if you host at heroku.com)

However, when I did a "view source" in the browser of the page I noticed that the URL of the picture was an Amazon S3 URL in the S3 bucket that I assigned to the app. I cut & pasted the URL and was able to view the picture in the same browser, and in in another browser in which I had no open sessions to my web app or to Amazon S3.

Is there any way that I could restrict access to that URL (and image) so that it is accessible only to browsers that are logged into my applications?

Most of the information I found about Amazon ACLs only talk about access for only the owner or to groups of users authenticated with Amazon or AmazonS3, or to everybody anonymously.

EDIT----UPDATE July 7, 2010

Amazon has just announced more ways to restrict access to S3 objects and buckets. Among other ways, you can now restrict access to an S3 object by qualifying the HTTP referrer. This looks interesting...I can't wait until they update their developer documents.

A: 

I think the best you can do is what drop.io does. While the data is in principle accessible to anyone, you give it a large and random URL. Anyone who knows the URL can access it, but your application controls who gets to see the URL.

Kind of security through obscurity.

You can think of it as the password included in the URL. This means that if you are serious about security, you have to treat the URL as confidential information. You have to make sure that these links do not leak to search engines, too.

It is also tricky to revoke access rights. The only thing you can do is invalidate a URL and assign a new one.

Thilo
+1  A: 

S3 is a separate service and does not know about your sessions.

The generic solution is to recognize the benefits and security properties that assigning each asset a separate, unique, and very long and random key, which forms part of the URL to that asset. If you so choose, you can even assign a key with 512 effective bits of randomness, and that URL will remain unguessable for a very long time.

  • Because someone who at time t has access to an asset can simply copy the asset for future reference, it makes sense to permit that person to know the URL and access the asset at any time.
  • Likewise, since that person can simply download the asset and distribute it to others, it makes sense to permit that person to distribute the URL to others to whom he would otherwise simply have distributed the asset itself.
  • Since all such access is read-only, and since writes are restricted to the website servers, there is no risk of malicious "hacking" from anyone who has this access.

You have to determine if this is sufficient security. If it isn't, then maybe S3 isn't for you, and maybe you need to store your images as binary columns in your database and cache them in memcached, which you can do on Heroku.

Justice
@Justice - Thank you for a complete and very well-reasoned answer. It is exactly the reasoning that I needed for using S3. The assets stored on S3 are not super-critical as far as privacy goes, and the URLs are only available to logged-in users. I suppose I have to do some kind of salted hash to generate the random number.
Jay Godse
+2  A: 

For files where privacy actually matters, we handle this as follows:

  • Files are stored with a private ACL, meaning that only an authorized agent can download (or upload) them
  • To access a file, we link to http://myapp.com/download/{s3-path}, where download corresponds to a controller (in the MVC sense)
  • ACLs are implemented as appropriate so that only logged-in users can access that controller/action
  • That controller downloads the file using the API, then streams it out to the user with correct mime-type, cache headers, file size, etc.

Using this method, you end up using a lot more bandwidth than you need, but you still save on storage. For us this works out, because we tend to run out of storage much more quickly than bandwidth.

For files where privacy only sort of matters, we generate a random hash that we use for the URL. This is basically security through obscurity, and you have to be careful that your hash is sufficiently difficult to guess.

However, when I did a "view source" in the browser of the page I noticed that the URL of the picture was an Amazon S3 URL in the S3 bucket that I assigned to the app. I cut & pasted the URL and was able to view the picture in the same browser, and in in another browser in which I had no open sessions to my web app or to Amazon S3.

Keep in mind that this is no different than any image stored elsewhere in your document root. You may or may not need the kind of security you're looking for.

notJim
If you *really* need the ACLs, this is definitely how to do it. However, on Heroku, and depending on the access patterns for these assets, this strategy will force you to "crank your dynos" much faster than otherwise.
Justice
Justice: I'm not sure that it's any worse than it would be to store the file locally and stream it out through your application, though. If you wanted to lock the files down in any non-trivial way, streaming through the application is basically the only solution. Of course, few applications have that kind of requirement.I'm also used to working in a dedicated server environment, so maybe my advice is not as applicable to heroku.
notJim
@notJim - Thank you for a very complete answer.
Jay Godse