views:

574

answers:

7

I'm building a Lifestreaming app that will involve pulling down lots of feeds for lots of users, and performing data-mining, and machine learning algorithms on the results. GAE's load balanced and scalable hosting sounds like a good fit for a system that could eventually be moving around a LOT of data, but it's lack of cron jobs is a nuisance. Would I be better off using Django on a co-loc and dealing with my own DB scaling?

+4  A: 

It might change when they offer paid plans, but as it stands, App Engine is not good for CPU intensive apps. It is designed to scale to handle a large number of requests, not necessarily a large amount of calculation per request. I am running into this issue with fairly minor calculations, and I fear I may have to start looking elsewhere as my data set grows.

Chris Marasti-Georg
A: 

No. If you need to pull lots of things down, App Engine isn't going to work so well. You can use it as a front end by putting your data in their store after doing your offline preprocessing, but you can't do much in the ~1 second time you have per request without doing some really crazy things.

Your app would likely be better off on your own hosting.

Patrick
A: 

Pulling feeds or doing calculations won't be a problem. But you'll soon have to pay for your account. App engine includes Django, except you'll need to work with some adaptors for the model part. It will surely save you from maintenance headaches.

Vasil
+1  A: 

If you're app solely relies on Django, then App Engine is a good bet. However, if you ever need to add C-enhanced libraries, you're up a creek. App Engine doesn't support things like PIL or ReportLab, which use C to speed up processing times. I'm only mentioning this because you may want to use C to speed up some of your routines in the long run.

If you decide to use a co-loc, check out WebFaction.com. They have great Django/Python support and they have no issue with you using the aforementioned lirbaries.

Huuuze
+1  A: 

Take a look at Slice Host: They sell xen based virtualized server instances starting at $20.00 / month...

We’re just like you. Sick of oversold, underperforming, ancient hosting companies. We took matters into our own hands. We built a hosting company for people who know their stuff. Give us a box, give us bandwidth, give us performance and we get to work. Fast machines, RAID-10 drives, Tier-1 bandwidth and root access. Managed with a customized Xen VPS backend to ensure that your resources are protected and guaranteed.

It's great for starting a project on and scaling it out WITHOUT incurring the costs of a managed provider or colo.

gahooa
+3  A: 

While I can not answer your question directly, my experience of building Microupdater (a news aggregator collecting a few hundred feeds on AppEngine) may give you a little insight.

  • Fetching feeds. Fetching lots of feeds by cron jobs (it was the only solution until SDK 1.2.5) is not efficient and scalable, which has lower limit on job frequency (say 1 min, so you could only fetch at most 60 feeds hourly). And with latest SDK 1.2.5, there is XMPP API, which I have not implemented yet. The best promising approach would be PubSubHubBub, of which you offer an callback url and HubBub will notify you new entries in real-time. And there is an demo implementation on AppEngine, which you can play around.

  • Parsing feeds. You may already know that parsing feeds is cpu-intensive. I use Universal Feed Parser by Mark Pilgrim, when parsing a large feed (say a public google reader topic), AppEngine may fail to process all entries. My dashboard have a lot of these CPU-limit warnings. But it may result in my incapability to optimize the code yet.

Totally said, AppEngine is not yet an ideal platform for lifestream app, but that may change in future.

Juvenn Woo
I'm sorry, StackOverFlow does not allow me to post *more* than one link, since I'm new here. So you need to google out the remaining 4 links.
Juvenn Woo
+2  A: 

(This is obviously pretty old, responding just because it still comes up really high in related Google queries...)

I just started using AppEngine and haven't been using it for tons of external requests. But I do know that the info above is probably a lot less valid now, and might not even still stand. They relaxed the limits quite a bit since September 08 - check Aral Balkan's blog for his initial complaint about the above, and later developments.

Ben