views:

466

answers:

2

I see a lot of answers like this one:

Printing a list of persons with more than one home each home with more than one

I have tried that answer with similar models and it seems like a terribly inefficient way of doing this. Each iteration seems to make a separate query sometimes resulting in thousands of queries to a database. I realize that you can cache the query sets, but it still seems very wrong. So the question is, do you use that method? If not, how do you do it?

+4  A: 

You can use the select_related() queryset method to reduce the number of database queries. You can also specify the depth, so in the given example if the telephone number model had additional foreign relationships you would used select_related(depth=2) to avoid selecting further "levels" of related entities.

ozan
Hmm, I guess the book I read didn't mention that you could select specific related models. I was running into a problem where using select_related() would also pull up 4 or 5 other manytomany relationships that I didn't need. Time to read djangoproject.com before asking questions! Thank you.
Matt
+6  A: 

This is a very good question, and one not limited to Django's ORM framework.

I always feel it's important to remember some of the problems that an object-relational mapping (ORM) framework solves:

  • Object-oriented CRUD: If the rest of the application is based on strong object-oriented principles, accessing data persistence using objects makes the code just that much more coherent, internally consistent, and sometimes shorter.

  • Persistence layer encapsulation: An ORM provides a clear layer in your application for DB access. It encapsulates all the functions needed to read/write data in one spot, the epitome of the so-called DRY (do not repeat yourself) principle. This makes a few things much easier: model changes, because all the DB-facing select and insert/update code is in one spot rather than throughout the app, security, because all DB access goes through one location, and testing, because it's easy to mock out your data models and access if they are clearly delineated.

  • SQL security: While it's easy to secure raw SQL use against injection attacks and such, it's even easier if you have an ORM framework as a single point of DB-contact that does it for you so you never have to think about it.

Notice that speed is not on the list. An ORM is a level of indirection between your code and the database. We certainly hold ORM designers responsible for writing a framework that produces good SQL statements, but an ORM is meant to provide code- and architecture-level efficiency, not executional efficiency. A developer who has read a basic book on SQL will always be able to get better performance talking directly to the DB.

There are certainly strategies to counter this, and in Django those are select_related() as ozan has mentioned, and site/view/miscellaneous caching, but they won't give you the same performance as a direct SQL statement. Because of this, I would never use an ORM framework that does not provide some mechanism for issuing a raw SQL statement on those occasions when I need speed. For example, I often resort to raw SQL when generating a large report out of the database that joins many tables; the ORM way can take minutes, the SQL way can take seconds.

Having said that, I never start by worrying about each individual query. My advice for anyone coming to an ORM layer is: don't nanny the ORM's database access. Write your application or module, and then profile it, tweaking those areas that truly need the performance boost, or using caching/select_related to reduce the overall DB-chattiness of your application.

Jarret Hardie
+1 solid answer. :)
Paolo Bergantino
Great answer! After playing around with select_related, I decided to benchmark it against direct SQL. I'm sure that you won't be surprised to hear that over 1,000,000 requests, the direct SQL was 15 times faster. Even with select related, django still wants to talk to the database a lot.
Matt
Hmmm... interesting! Got a blog? You'd have to phrase everything carefully to avoid being flamed, but the benchmark procedure and results would be very interesting reading for the community.
Jarret Hardie
(by which I mean the Django community)... kudos for doing the benchmarking as most people don't analyze their tools that carefully. Any contribution to the understanding of the framework is always welcome. Cheers!
Jarret Hardie
However, the difference in the amount of code is also significantly different: 3 lines of code compared to over 100 lines. Thank you very much for your wonderful advice. I guess it's time to close the database monitor and actually get some work done.
Matt
No, I don't have a blog. I probably shouldn't have said that. I just started playing with Django, so I may still be missing something here. Have any book recommendations? The book that I just bought is out of date and does not cover django's ORM in detail.
Matt
Can you please post the SQL that django is generating with select_related()? If your raw SQL is 15 times faster, either there's a bug or you're doing it wrong. It would be good to also post this on the django users groups to bring it to Malcolm's attention.
ozan
Well, before I do that. Let me post what I have so you guys can see if I am doing it correctly. Would it be best to make another question or edit and add it to this question?
Matt
My vote is add it to this question.
Jarret Hardie
I agree, add it to this one
ozan