views:

131

answers:

1

I am experimenting with GAE for last 2 Months.

I am saving records to the bigtable by uploading CSV file.

My Test File's size is 300 KB.

Here what i found

Local system

  • Upload take less than 1 second
  • Process 2500 records in 3 seconds

On Google Sandbox

  • Upload takes 5-7 seconds.

  • Processing file gives timeout.

  • It only save 60-180 records.

My questions are

  1. Why it takes too much time?
  2. Is there a way to reduce this time?
  3. Google counts this processing towards CPU uses. They do not disclose h/w so what CPU internally they use? I mean do i get a CPU euquivalent or greater than PIII?

Edited for @Drew Sears's answer.

What i am doing at present

  1. Upload the file to GAE
  2. Get uploaded data bytes. By stream, count lines , save it into bigtable.
  3. There is a unique field, id, my Record.
  4. Now, i create queue

int x = linesCount/ 50;

for(int i<0;i=x;i++)
{
        x = i * 50;
        Queue queue = QueueFactory.getQueue("test-queue");
        queue.add(TaskOptions.Builder.url("/TestQueue")
                .param("id", id.toString())
                .param("startIdx",String.valueOf(x))
                .param("totRec",String.valueOf(50))
        );
    }

int y = linesCount % 50;
if( y > 0 )
{
    x = (linesCount / 50) * 50;
    Queue queue = QueueFactory.getQueue("test-queue");
    queue.add(TaskOptions.Builder.url("/TestQueue")
            .param("id", id.toString())
            .param("startIdx",String.valueOf(x))
            .param("totRec",String.valueOf(y))
    );                      
}

The task processing servlet read file from storage and using totRec and startIdx process the file and close it..

+2  A: 

This is really not a great way to test App Engine's scalability.

  1. If it's taking you 7 seconds to post 300KB, the bottleneck is almost certainly your upstream bandwidth, not Google's downstream bandwidth, or anything to do with App Engine. I routinely get much faster upload speeds.
  2. If you want requests to finish faster, minimize your RPC calls. Every datastore get, put, or query is a round-trip to an external server. If you're looping over hundreds of rows and doing a put inside each loop iteration, you're incurring a massive amount of unnecessary overhead. Save all of your entities using one datastore put and you will get much faster results. Guido's AppStats framework is a great tool for finding RPC optimization opportunities.
Drew Sears
+1 for mentioning the perils of doing a separate put() for each row
Peter Recore
I can minimize RPC request but how can i reduce datastore request? I have to save 3k records that need 3k database put (or makePersistant() call in my situation). Is there a bulk save method? s
Manjoor
Same thing. Each datastore request is an RPC call. Yes, the datastore lets you store multiple entities in one call. In Python this is just db.put() with a list of entities; I don't know what the syntax would be in Java.
Drew Sears