tags:

views:

511

answers:

3

Alright, I have a Django view, like this:

@render_to('home/main.html')
def login(request):
    # also tried Client.objects.select_related().all()
    clients = Client.objects.all()
    return {'clients':clients}

And I have a template, main.html, like this:

<ul>
{% for client in clients %}
<li>{{ client.full_name }}</li>
    <ul>
    {% for pet in client.pets.all %}
        <li>{{ pet.full_name }}</li>
    {% endfor %}
    </ul>
{% endfor %}
</ul>

I also print out all the queries in sql_queries at the bottom of my base template. When I run this view, the following queries are made:

SELECT `home_client`.`id`, ... FROM `home_client`;
SELECT `home_pet`.`id`, ... FROM `home_pet` WHERE `home_pet`.`client_id` = 1;
SELECT `home_client`.`id`, ... FROM `home_client` WHERE `home_client`.`id` = 1;
SELECT `home_client`.`id`, ... FROM `home_client` WHERE `home_client`.`id` = 1;
SELECT `home_pet`.`id`, ... FROM `home_pet` WHERE `home_pet`.`client_id` = 2;
SELECT `home_client`.`id`, ... FROM `home_client` WHERE `home_client`.`id` = 2;

My question is, why are all these queries being made? Shouldn't it just be 1 query to retrieve all the clients and a query per client to retrieve all the pets from each client? I have 2 clients in the home_client table, so it should be 3 queries total. Most troubling of all is that queries 3 and 4 are 100% identical. I don't want to "prematurely optimize" or anything but I do want to make sure Django isn't being wildly inefficient. Any help on this would be appreciated. Thanks.

+7  A: 

Django uses a cache. The RDBMS uses a cache. Don't prematurely optimize the queries.

You can play with bulk queries in your view function instead of one-at-a-time queries in your template.

@render_to('home/main.html')
def login(request):
    # Query all clients 
    clients = Client.objects.all()
    # Assemble an in-memory table of pets
    pets = collections.defaultdict(list)
    for p in Pet.objects.all():
        pets[pet.client].append(p)
    # Create clients and pets tuples
    clientsPetTuples = [ (c,pets[c]) for c in clients ]
    return {'clientPets': clientsPetTuples}

However, you don't seem to have any evidence that your template is the slowest part of your application.

Further, this trades off giant memory use against SQL use. Until you have measurements that prove that your template queries are actually slow, you shouldn't be over thinking the SQL.

Don't worry about the SQL until you have evidence.

S.Lott
by bulk queries do you mean doing the select_related() in the view?
Paolo Bergantino
Thanks for the sample. I know premature optimization is evil, I was mostly wondering why it was happening, wanted to make sure it wasn't my fault somehow.
Paolo Bergantino
It's not your fault until you *measure* the performance. After you have measured and proven that the queries are slow, then it's your fault. Until you measure, you don't really *know* anything.
S.Lott
+3  A: 

try using Client.objects.all().select_related()

This will automagically also cache related models in a single database query.

TokenMacGuy
+3  A: 

Does Client 1 have 2 Pets and Client 2 have 1 Pet?

If so, that would indicate to me that Pet.full_name or something else you're doing in the Pet display loop is trying to access its related Client's details. Django's ORM doesn't use an identity map, so accessing the Client foreign key from any of your Pet objects would require hitting the database again to retrieve that Client.

P.S. select_related won't have any effect on the data you're using in this scenario as it only follows foreign-key relationships, but the pet-to-client relationship is many-to-one.

Update: if you want to avoid having to change the logic in Pet.full_name or having to perform said logic in the template instead for this case, you could alter the way you get a handle on each Client's Pets in order to prefill the ForeignKey cache with for each Pet with its Client:

class Client(models.Model):
    # ...
    def get_pets(self):
        for pet in self.pets.all():
            setattr(pet, '_client_cache', self)
            yield pet

...where the 'client' part of '_client_cache' is whatever attribute name is used in the Pet class for the ForeignKey to the Pet's Client. This takes advantage of the way Django implements access to ForeignKey-related objects using its SingleRelatedObjectDescriptor class, which looks for this cache attribute before querying the database.

Resulting template usage:

{% for pet in client.get_pets %}
...
{% endfor %}
insin
Yes, the Pet only has a first name, the last name is from the client's last name. Is it a big deal that this is happening or does the Django cache and all that do a good job that I can sleep at night accessing the last name the way I am or am I better served piecing the full name myself instead?
Paolo Bergantino
Django does do some caching, but the cache for each Pet's Client is in the Pet instance. With what you're doing, you'd have (1 + number of clients + number of pets) queries for this page. With my suggestion in place, you'd end up with (1 + number of clients queries). With S.Lott's: 2 queries.
insin
Thank you for your help. Code sampled worked. I'll weigh my options from here.
Paolo Bergantino