views:

262

answers:

3

I think I've read somewhere that Django's ORM lazily loads objects. Let's say I want to update a large set of objects (say 500,000) in a batch-update operation. Would it be possible to simply iterate over a very large QuerySet, loading, updating and saving objects as I go?

Similarly if I wanted to allow a paginated view of all of these thousands of objects, could I use the built in pagination facility or would I manually have to run a window over the data-set with a query each time because of the size of the QuerySet of all objects?

A: 

As I benchmarked this for my current project with dataset of 2.5M records, I found that execution time with django is 10 times more, and memory consumption in a process about 25 times more.

And I was only reading and counting, without performing update/insert queries.

Try to investigate this question for yourself, benchmark isn't hard to write and execute.

Vestel
Sorry 10 times and 25 times more than what? Straight SQL queries?
Joe
Full read of data and performing some activities with them took 10 times more time and 25 times more memory, when I was using Django ORM than when I was using SQL queries and manipulating with retrieved data as python list.
Vestel
-1. This is a meaningless statistic. Plus, it depends on how you were using the ORM - `len(queryset)` for example can be massively less efficient than `queryset.count()`.
Daniel Roseman
+1  A: 

If the batch update is possible using a SQL query, then i think using sql-queries or django-orm will not make a major difference. But if the update actually requires loading each object, processing the data and then updating them, you can use the orm or write your own sql query and run update queries on each of the processed data, the overheads completely depends on the code logic.

The built-in pagination facility runs a limit,offset query (if you are doing it correct), so i don't think there are major overheads in the pagination either ..

ranedk
Thanks. No, the update isn't possible in a query.
Joe
+3  A: 

If you evaluate 500000 results queryset it will get cached in memory, which is big. Instead, you can use iterator() method on queryset, which will return results as requested, withouth huge memory consumption.

Also, use update() and F() objects in order to do simple batch-updates in single query.

Dmitry Shevchenko