views:

314

answers:

2

Say I have the following models:

class Image(models.Model):
    image   = models.ImageField(max_length=200, upload_to=file_home)
    content_type = models.ForeignKey(ContentType)
    object_id = models.PositiveIntegerField()
    content_object = generic.GenericForeignKey()

class Article(models.Model):
    text = models.TextField()
    images = generic.GenericRelation(Image)

class BlogPost(models.Model):
    text = models.TextField()
    images = generic.GenericRelation(Image)

What's the most processor- and memory-efficient way to find all Articles that have at least one Image attached to them?

I've done this:

Article.objects.filter(pk__in=Image.objects.filter(content_type=ContentType.objects.get_for_model(Article)).values_list('object_id', flat=True))

Which works, but besides being ugly it takes forever.

I suspect there's a better solution using raw SQL, but that's beyond me. For what it's worth, the SQL generated by the above is as following:

 SELECT `issues_article`.`id`, `issues_article`.`text` FROM `issues_article` WHERE `issues_article`.`id` IN (SELECT U0.`object_id` FROM `uploads_image` U0 WHERE U0.`content_type_id` = 26 ) LIMIT 21

EDIT: czarchaic's suggestion has much nicer syntax but even worse (slower) performance. The SQL generated by his query looks like the following:

SELECT DISTINCT `issues_article`.`id`, `issues_article`.`text`, COUNT(`uploads_image`.`id`) AS `num_images` FROM `issues_article` LEFT OUTER JOIN `uploads_image` ON (`issues_article`.`id` = `uploads_image`.`object_id`) GROUP BY `issues_article`.`id` HAVING COUNT(`uploads_image`.`id`) > 0  ORDER BY NULL LIMIT 21

EDIT: Hooray for Jarret Hardie! Here's the SQL generated by his should-have-been-obvious solution:

SELECT DISTINCT `issues_article`.`id`, `issues_article`.`text` FROM `issues_article` INNER JOIN `uploads_image` ON (`issues_article`.`id` = `uploads_image`.`object_id`) WHERE (`uploads_image`.`id` IS NOT NULL AND `uploads_image`.`content_type_id` = 26 ) LIMIT 21
A: 

I think your best bet would be to use aggregation

from django.db.models import Count

Article.objects.annotate(num_images=Count('images')).filter(num_images__gt=0)
czarchaic
Makes more sense, but actually runs slower! See the edit.
hanksims
+4  A: 

Thanks to generic relations, you should be able to query this structure using traditional query-set semantics for reverse relations:

Article.objects.filter(images__isnull=False)

This will produce duplicates for any Articles that are related to multiple Images, but you can eliminate that with the distinct() QuerySet method:

Article.objects.distinct().filter(images__isnull=False)
Jarret Hardie
I think we have a winner! I'll post the generated SQL in another edit, just for completeness's sake. Ran lighting-quick, though. is_null=False ... sometimes the simplest things are just staring you right in the face.
hanksims
Thanks hanksims... hope it works out :-) I admit I haven't looked at the SQL, so definitely curious to see that.
Jarret Hardie