ansaurus

Question

Iterating over a large Django queryset while the data is changing elsewhere

Answer 1

+2 A:

The following will do the job for you in Django 1.1, no loop necessary:

from django.db.models import F
Book.objects.all().update(activity=F('views')*4)

You can have a more complicated calculation too:

for book in Book.objects.all().iterator():
    Book.objects.filter(pk=book.pk).update(activity=book.calculate_activity())

Both these options have the potential to leave the activity field out of sync with the rest, but I assume you're ok with that, given that you're calculating it in a cron job.

Will Hardy 2010-01-20 20:42:42

There is actually a lot of processing going on, including other fields. I just simplified it for the sake of this question. That would be a fine solution, but I've got a lot going on there that I don't think I can just inline.

Samuel Clay 2010-01-20 21:01:49

I think you may have to live with the fact that 'activity' could be out of sync with other fields, which I assume you're already fine with, given that you're using a cron to update 'activity'. I've updated the answer to handle more complicated processing, while avoiding overwriting the object.

Will Hardy 2010-01-21 06:27:52

Added `.iterator()`, as mentioned by Alex.

Will Hardy 2010-01-21 08:53:43

Answer 2

+1 A:

No matter how you solve this, beware of transaction-related issues. E.g. default transaction isolation level is set to REPEATABLE READ, at least for MySQL backend. This, plus the fact that both Django and db backend work in a specific autocommit mode with an ongoing transaction means, that even if you use (very nice) whrde suggestion, value of `views' could be no longer valid. I could be wrong here, but feel warned.

Tomasz Zielinski 2010-01-21 00:22:07

Answer 3

+1 A:

In addition to what others have said if you are iterating over a large queryset you should use iterator():

Book.objects.filter(stuff).order_by(stuff).iterator()

this will cause Django to not cache the items as it iterates (which could use a ton of memory for a large result set).

Alex Gaynor 2010-01-21 07:03:01

.itertor() works great, but is there any way to "lock" the database so I can perform a read and then a subsequent write in an atomic fashion? Will this require dropping down to raw SQL, where it certainly is possible, or does Django provide a hook for this?

Samuel Clay 2010-01-21 18:33:36

ansaurus

tags:

views:

answers:

Iterating over a large Django queryset while the data is changing elsewhere

related questions