views:

122

answers:

1

I am a bit confused. I wrote a Java stand alone app and now I want to use GAE to deploy it on the web and on the way also to learn about GAE. In my application, I read data from file, store it in memory, process it, and then store the results in memory or file. I understand that now I need to store the results in the GAE's data store, which is fine. So I can run my program independently on my computer, then write the results to file, and then use GAE to upload all the results to the data store, and then users can query it. However, is there a way that I can transfer the entire process into the GAE application? so the application reads data from file, do the processing (use the memory on the application server and not my computer - needs at least 4GB of RAM), and then when it's done (might take 1-2 hours), writes everything to the GAE data store? (so it's an internal "offline" process that no users are involved).

I'm a bit confused since Google don't mention anything about memory quota.

Thanks!

+6  A: 

You will not be able to do your offline processing the way you are envisioning. There is a limit to how much memory your app can use, but that is not the main problem. All processing in app engine is done in request handlers. In other words, any action you want your app to do will be written as if it is handling a web request. Each of these handlers is limited to 30 seconds of running time. If your process tries to run longer, it will get shut down. App engine is optimized for serving web requests, not doing heavy computations.

All that being said, you may be able to break up your computational tasks into 30 second chunks and store intermediate results in the datastore or memcache. In that case you could use a cron job or task queue (both described in the app engine docs) to keep calling your processing handlers until the data crunching was done.

In summary, yes, it may be possible to do what you want, but it might not be worth the trouble. Look into other cloud solutions like Amazon's EC2 or Hadoop if you want to do computationally intensive things.

Peter Recore
This explains things. Thanks a lot! btw - I prefer GAE since it's free for my needs. While from my understanding, Amazon charge you from the beginning. I also considered Hadoop, and Amazon + Hadoop is probably a good solution, but Hadoop alone doesn't help me since it's just a software and I still need the infrastructure. But right now, Hadoop is not on my priority list because the computation time is not a big concern. I think that I will do the computation offline on my computer, save all the results to CSV files, and then upload them to the GAE data store.
you should try stax.net I'm using it now and it's just fine
mnml
@mnml - the poster wanted a free solution. stax.net does not list prices yet, nor does it mention that there will be a free option when they get out of beta.
Peter Recore
As far as I know, it will stay as it is, the prices will go up if you want to add more instances, Thats what they told me. Just try'in to help here :) Because I don't think Google App Engine is a good solution for what he wants to do.
mnml
ah, that's good to know. they should probably add that to their FAQ.
Peter Recore
Thanks a lot guys. Never heard about stax.net before; will check it out.