views:

156

answers:

2

I'm using the new experimental taskqueue for java appengine and I'm trying to create tasks that aggregate statistics in my datastore. I'm trying to count the number of UNIQUE values within all the entitities (of a certain type) in my datastore. More concretely, say entity of type X has a field A. I want to count the NUMBER of unique values of A in my datastore.

My current approach is to create a task which queries for the first 10 entities of type X, creating a hashtable to store the unique values of A in, then passing this hashtable to the next task as the payload. This next task will count the next 10 entities and so on and so forth until I've gone through all the entities. During the execution of the last task, I'll count the number of keys in my hashtable (that's been passed from task to task all along) to find the total number of unique values of A.

This works for a small number of entities in my data store. But I'm worried that this hashtable will get too big once I have a lot of unique values. What is the maximum allowable size for the payload of an appengine task?????

Can you suggest any alternative approaches?

Thanks.

+4  A: 

According to the docs, the maximum task object size is 10K.

Jonathan Feinberg
+1 for cold hard facts.
Lucas McCoy
does object size = payload size?
aloo
You need to serialize your object somehow. That's the payload. If you expect it to be more than 10k, you can use the deferred library's trick of serializing the key of a datastore entity containing the actual data.
Nick Johnson
+1  A: 

"Can you suggest any alternative approaches?".

Create an entity for each unique value, by constructing a key based on the value and using Model.get_or_insert. Then Query.count up the entities in batches of 1000 (or however many you can count before your request times out - more than 10), using the normal paging tricks.

Or use code similar to that given in the docs for get_or_insert to keep count as you go - App Engine transactions can be run more than once, so a memcached count incremented in the transaction would be unreliable. There may be some trick around that, though, or you could keep the count in the datastore provided that you aren't doing anything too unpleasant with entity parents.

Steve Jessop