views:

411

answers:

2

Hi,

There have been quite a few occasions recently when app engine appears to run slower. To some degree that's understandable with the architecture of their cloud platform. I'm not talking about new server instances - just requests to warm servers. I'm also just referring to CPU, not datastore API, but I do wonder about that as well.

It seems that during these slow periods I get a lot more yellow warnings on my requests - saying I am using a lot of CPU. Certainly they take longer to complete during this period. What concerns me is that during these slow periods, my billable CPU seems to go up.

So to be clear - when app engine is fast, a request might complete in 100ms. In a slow period, it might take more than 1s for the same request. Same URI, same caching, same processing path, same datastore, same indexes - much more CPU. The yellow warnings, as I understand it, are referring to billable CPU usage, and there's many more of them when app engine is slower.

This seems to set up a bizarre situation where my app costs more to run when app engine performance is worse. This means google makes more money the more poorly the platform performs (up to the point where it fails or customers leave). Maybe I've got the situation all wrong, and it doesn't work like that - but if it does work like that, then as a customer the pressures and balances there are all wrong. That's not intimating any wrong-doing on google's part - just that the relationships between those two things don't seem right.

It almost seems like google's algorithm goes something like - 'If I give a processing job to a CPU and start my watch, then stop it when the job returns I get the billable CPU figure.' i.e. it doesn't measure CPU work at all. Surely that time should be divided by the number of processing jobs being concurrently executed plus some extra to cover the additional context switching. I'm sure that stuff is hard to measure - perhaps that's the reason.

I guess you could argue it is fair that you pay more when app engine is in high demand, but that makes budgeting close to impossible - you can't generate stats like '100 users costs me $1 a day', because that could change for a whole host of reasons - including app engine onboarding more customers than the infrastructure can realistically handle. If google over-subscribes app engine then all customers pay more - it's another relationship that doesn't sound right. Surely google's costs should go down as they onboard more customers, and those customers use more resources - based on economies of scale.

Should I expect two identical requests in my app to cost me roughly the same amount each time they run - regardless of how much wall-time app engine takes to actually complete them? Have I misunderstood how this works? If I haven't, is there a reason why I shouldn't be worried about it in the long term? Is there some documentation which makes this situation clearer? Cheers,

Colin

A: 

Yes this is true. It is a bummer. It also takes them over a second to start up my Java application (which I was billed for) every time they decided my site was in low demand, and didn't need the resources.

I ended up using a cron to auto ping my site every minute to keep it warm.. doing all the wasted work made my bill cheaper, as it didn't have the startup time, instead it just had lots of 2ms pings...

bwawok
Understand re. the cold starts - noted at the top that I was just referring to warm servers though. Just interested in the performance vs. cost question - don't want to make this a thread for a host of other gripes. Do you know for sure that google has set up app engine to be more expensive when it is slower, or are you just referring to anecdotal observations as I am? Ta.
hawkettc
+2  A: 

It would be more complicated, but they could change the billing algorithm to be a function of load. Or perhaps they could normalize the CPU measurements based on the performance of similar calls in the past.

I agree that this presents problems for the developers.

Greg
Agreed - they could also have a standard set of requests that they use themselves. Decide what the standard CPU use of those requests is, and then run them at regular intervals. If the result is 2x normal, then divide everyone's billable CPU by 2 for that period. Given the scale of their product, that should average out pretty well statistically if they choose a representative set of standard requests. Can't really see why this solution would be that tough to implement (and is roughly what you were saying I think). Thx.
hawkettc
I guess the converse of this is true as well, if app engine was running particularly fast, then increase everyone's CPU measurements. Given that google states the free quota is based on serving x million requests, it is surprising they aren't using this approach to normalise everything.
hawkettc
I know the GAE engineers are quite proud of their platform and are working hard to fix this. But I can't resist... the irony of course is, "You may have heard that here at Google we're obsessed with speed" and the fact that Google now uses site speed in their ranking algorithm. I wonder if they handicap the sites running on GAE. :) Sorry, couldn't resist.
Greg