views:

2476

answers:

5

Hi guys, I know this has been asked before but there is really not a clear answer. My problem is I built a file upload script for GAE and only found out after, that you can only store files up to aprox. 1MB in the data store. I can stop you right here if you can tell me that if I enable billing the 1MB limit is history but I doubt it.

I need to be able to upload up to 20mb per file so I thought maybe I can use Amazon's S3. Any ideas on how to accomplish this?

I was told to use a combination of GAE + Ec2 and S3 but I have no idea how this would work.

Thanks, Max

+3  A: 

Google App Engine and EC2 are competitors. They do the same thing, although GAE provides an environment for your app to run in with strict language restrictions, while EC2 provides you a virtual machine ( think VMWare ) on which to host your application.

S3 on the other hand is a raw storage api. You can use a SOAP or REST api to access it. If you want to stick with GAE, you can simply use the Amazon S3 Python Library to make REST calls from Python to S3.

You will, of course, have to pay for usage on S3. Its amazing how granular their billing is. When getting started I was literally charged 4 cents one month.

Serapth
Okay, so far so good. But if I have a 20MB file and I use the Amazon S3 Python Library to send that file to S3...won't GAE kill the process because it takes longer than 30 seconds?
mistero
To be honest, I dont really know GAE's limitations, I just looked at it briefly and its flaws were way to apparent and limiting for my particular uses. To be honest, outside the fact they have a free edition available, I see very little to recommend it.
Serapth
I'm pretty sure he already knew all this - and it's not what he was asking.
Nick Johnson
A: 

Some Google App Engine + S3 links:

Previous related post... 10mb limit.

This link demonstrates small file uploads. I haven't found an example of large uploads yet...

This link shows a different approach, (with a fix for a known issue)

John Weldon
+6  A: 

From the Amazon S3 documentation:

  1. The user opens a web browser and accesses your web page.

  2. Your web page contains an HTTP form that contains all the information necessary for the user to upload content to Amazon S3.

  3. The user uploads content directly to Amazon S3.

GAE prepares and serves the web page, a speedy operation. You user uploads to S3, a lengthy operation, but that is between your user's browser and Amazon; GAE is not involved.

Part of the S3 protocol is a *success_action_redirect*, that lets you tell S3 where to aim the browser in the event of a successful upload. That redirect can be to GAE.

Thomas L Holaday
Ok sounds great and I will definitely do it like this. How would you go ahead if you wanted to store information about the file on GAE Data Store? Like the user who stored it and the mime-type for example?
mistero
Store that information when the user requests the 'redirect' page. You can do a HEAD request on the newly-uploaded file to fetch the metadata, if necessary.
Nick Johnson
What about security issues? I mean there is no way to validate the data (except AJAX) in the form before submitting it to S3 right? So basically if I set the max. file-size within the form like Amazons suggests it you can just write your own form and upload to my bucket? And the meta-data I would add within the form can also easily be modified...
mistero
Never mind. I found the encrypted policy file ;)! Thanks so much for your help guys! I am new to Stack Overflow but this is amazing!
mistero
+1  A: 

For future reference, Google added support for large file upload (up to 50 MB): The new feature was released as part of the Blobstore API and is discussed here.

notnoop
[That link](http://code.google.com/appengine/docs/python/blobstore/overview.html#Quotas_and_Limits) states *maximum object size: 2 gigabytes*. I don't know when it changed, but it's still good news :)
voyager
+1  A: 

Thomas L Holaday's answer is the correct answer, I suppose. Anyway, just in case, here's a link to Amazon Web Services SDK for App Engine (Java), which you can use e.g. to upload files from App Engine to Amazon S3. (Edit: Oh, just noticed -- excepting S3) http://apetresc.wordpress.com/2010/06/22/introducing-the-gae-aws-sdk-for-java/

Written by Adrian Petrescu. From his web site:

[It is] a version of the Amazon Web Services SDK for Java that will run from inside of Google App Engine. This wouldn’t work if you simply included the JAR that AWS provides directly into GAE’s WAR, because GAE’s security model doesn’t allow the Apache Commons HTTP Client to create the sockets and low-level networking primitives it requires to establish an HTTP connection; instead, Google requires you to make all connections through its URLFetch utility

LeoMaheo
Thanks for the shout-out ;)
Adrian Petrescu
Also, I should note, that there's a decent chance it will work with S3 as well. Give it a try just by removing the filter from the build.xml file and testing it out. I'd be curious what the results are.
Adrian Petrescu