I'm using the Python GAE SDK.
I have some processing that needs to be done on 6000+ instances of MyKind
. It is too slow to be done in a single request, so I'm using the task queue. If I make a single task process only one entity, then it should take only a few seconds.
The documentation says that only 100 tasks can be added in a "batch". (What do they mean by that? In one request? In one task?)
So, assuming that "batch" means "request", I'm trying to figure out what the best way is to create a task for each entity in the datastore. What do you think?
It's easier if I can assume that the order of MyKind
will never change. (The processing will never actually change the MyKind
instances - it only creates new instances of other types.) I could just make a bunch of tasks, giving each one an offset of where to start, spaced less than 100 apart. Then, each task could create individual tasks that do the actual processing.
But what if there are so many entities that the original request can't add all the necessary scheduling tasks? This makes me think I need a recursive solution - each task looks at the range it is given. If only one element is present in the range, it does processing on it. Otherwise, it subdivides the range further into subsequent tasks.
If I can't count on using offsets and limits to identify entities (because their ordering isn't ensured to be constant), maybe I could just use their keys? But then I could be sending 1000s of keys around, which seems unwieldy.
Am I going down the right path here, or is there another design I should consider?