views:

34

answers:

1

Here's the requirement at a very high level.

  • We are going to distribute desktop agents (or browser plugins) to collect certain information from tons of users (in thousands or possibly millions down the road).

  • These agents collect data and periodically upload it to a server app.

  • The server app will allow for analyzing collected data (filter, sort etc based on 4-5 attributes) and summarize in form of charts etc.

  • We should also be able to export some of the collected data (csv or pdf)

We are looking for an platform to host the server app. GAE seems attractive because of low administrative cost and scalability (as users base increases, the platform will handle the scale... hopefully!).

Is GAE a viable option for us?

One important consideration is that sometimes the volume of uploads from the agents can exceed 50MB per upload cycle. We will have users in places where Internet connections could be very slow too. Apparently GAE has a limit on the duration a request can last. The upload volume may cause the request (transferring data from an agent to the server) to last longer than 30 seconds. How would one handle such situation?

Thanks!

A: 

The time of the upload is not considered part of the script execution time, so no worries there.

Google App Engine is very good to perform a vast number of smaller jobs but not so much to do complex long running background jobs (because of the 30 sec limit + even smaller database connection time limit). So probably GAE would be a very good platform to GATHER the data but not for actually ANALYZING it. You probably would like to separate these two.

Andris
Andris thanks for your help. Now it clear to me that uploading part is not an issue. The analysis part is going to be an issue for sure. If we end up collecting, let's say, a few million records, searching within them might take longer than 30 seconds.How can I separate out collection and analysis parts? You mean transfer the data to EC3 or something like that for analysis?appreciate your clarification!
greppz
I am doing something similar to this but on a small level (0.5 million records daily). If you know all your analysis logic in advance, you can create some more tables for summary in parallel so that you need not to look detail data. If it sounds good i can give you an example.
Manjoor