views:

71

answers:

4

What is the difference between these 2 pieces of code?

query=Location.all(keys_only=True)
while query.count()>0:
  db.delete(query.fetch(5))

# --

while True:
  query=Location.all(keys_only=True)
  if not query.count():
    break
  db.delete(query.fetch(5))

They both work.

A: 

In the second one, query will be assigned/updated in every loop. I don't know if this is needed with the logic behind it (I don't use google app engine). To replicate this behaviour, the first one would have to look like this:

query=Location.all(keys_only=True)
while query.count()>0:
  db.delete(query.fetch(5))
  query=Location.all(keys_only=True)

In my oppinion, the first style is way more readable than the second one.

Femaref
+6  A: 

Logically, these two pieces of code perform the same exact thing - they delete every Location entity, 5 at a time.

The first piece of code is better both in terms of style and (slightly) in terms of performance. (The query itself does not need to be rebuilt in each loop).

However, this code is not as efficient as it could be. It has several problems:

  1. You use count() but do not need to. It would be more efficient to simply fetch the entities, and then test the results to see if you got any.

  2. You are making more round-trips to the datastore than you need to. Each count(), fetch(), and delete() call must go the datastore and back. These round-trips are slow, so you should try to minimize them. You can do this by fetching more entities in each loop.

Example:

q = Location.all(keys_only=True)
results = q.fetch(500)
while results:
    db.delete(results)
    results = q.fetch(500)

Edit: Have a look at Nick's answer below - he explains why this code's performance can be improved even more by using query cursors.

David Underhill
Thanks for pointing out count() is an unnecessary overhead.
.count() doesn't actually fetch the keys - but it does require the datastore backend to iterate over them all.
Nick Johnson
Also, bulk deletes are limited to 500 keys.
Nick Johnson
I've updated my answer to fix the bit about `count()` and the 500 key limit on `db.delete()`.
David Underhill
Very interesting that `db.delete()` is limited to 500 keys. After some experimentation, it seems like `db.put()` is also limited to 500 keys, and `db.get()` is limited to 1,000 keys. This would be worth including in the [documentation](http://code.google.com/appengine/docs/python/datastore/functions.html) - I don't see these limits spelled out there, or in the method stubs in the code.
David Underhill
A: 

It's too bad you can't do this in python.

query=Location.all(keys_only=True)
while locations=query.fetch(5):
  db.delete(locations)

Like in the other P language

while(@row=$sth->fetchrow_array){
  do_something();
}
You can, after a fashion - see my answer.
Nick Johnson
+1  A: 

Here's a solution that's neater, but you may or may not consider to be a hack:

q = Location.all(keys_only=True)
for batch in iter(lambda: q.fetch(500), []):
  db.delete(batch)

One gotcha, however, is that as you delete more and more, the backend is forced to skip over the 'tombstoned' entities to find the next ones that aren't deleted. Here's a more efficient solution that uses cursors:

q = Location.all(keys_only=True)
results = q.fetch(500)
while results:
  db.delete(results)
  q = Location.all(keys_only=True).with_cursor(q.cursor())
  results = q.fetch(500)
Nick Johnson
+1 for deleting with a cursor - I had no idea that the backend was stuck skipping over tombstoned entities. I'll be applying this idea to some of my code - thanks :). I think would be the sort of advice that would be really helpful to have in an article of some sort on the app engine site. As far as I can tell, it isn't mentioned in the docs or in the articles on the datastore (but it is quite possible I just can't find it). What do you think?
David Underhill
I'm fairly sure it's mentioned in one of the datastore articles, but I can't locate it right now. Note that, per the nature of Bigtable, the tombstone records only persist until that tablet is next compacted (which will occur faster if you're doing lots of operations on it), so it's not a permanent condition, naturally.
Nick Johnson
In your second example, shouldn't `q.cursor()` be `results.cursor()`?
tomlog
No - results is an array. But I'm not preserving the query and I should be - I'll fix that.
Nick Johnson