I'm using the new experimental taskqueue for java appengine and I'm trying to create tasks that aggregate statistics in my datastore. I'm trying to count the number of UNIQUE values within all the entitities (of a certain type) in my datastore. More concretely, say entity of type X has a field A. I want to count the NUMBER of unique values of A in my datastore.
My current approach is to create a task which queries for the first 10 entities of type X, creating a hashtable to store the unique values of A in, then passing this hashtable to the next task as the payload. This next task will count the next 10 entities and so on and so forth until I've gone through all the entities. During the execution of the last task, I'll count the number of keys in my hashtable (that's been passed from task to task all along) to find the total number of unique values of A.
This works for a small number of entities in my data store. But I'm worried that this hashtable will get too big once I have a lot of unique values. What is the maximum allowable size for the payload of an appengine task?????
Can you suggest any alternative approaches?
Thanks.