views:

161

answers:

2

I'm rather stumped about the best way to build a Django query that checks if all the elements of a ManyToMany field (or a list) are present in another ManyToMany field.

As an example, I have several Persons, who can have more than one Specialty. There are also Jobs that people can start, but they require one or more Specialties to be eligible to be started.

class Person(models.Model):
    name = models.CharField()
    specialties = models.ManyToManyField('Specialty')

class Specialty(models.Model):
    name = models.CharField()

class Job(models.Model):
    required_specialties = models.ManyToManyField('Specialty')

A person can start a job only if they have all the specialties that the job requires. So, again for the sake of example, we have three specialties:

  • Coding
  • Singing
  • Dancing

And I have a Job that requires the Singing and Dancing specialties. A person with Singing and Dancing specialties can start it, but another with Coding and Singing specialties cannot -- as the Job requires a Person who can both sing and dance.

So, now I need a way to find all jobs that a person can take on. This was my way to tackle it, but I'm sure there's a more elegant approach:

def jobs_that_person_can_start(person):
    # we start with all jobs
    jobs = Job.objects.all()
    # find all specialties that this person does not have
    specialties_not_in_person = Specialty.objects.exclude(name__in=[s.name for s in person.specialties])
    # and exclude jobs that require them
    for s in specialties_not_in_person:
        jobs = jobs.exclude(specialty=s)
    # the ones left should fill the criteria
    return jobs.distinct()

This is because using Job.objects.filter(specialty__in=person.specialties.all()) will return jobs that match any of the person's specialties, not all of them. Using this query, the job that requires Singing and Dancing would appear for the singing coder, which is not the desired output.

I'm hoping this example is not too convoluted. The reason I'm concerned about this is that the Specialties in the system will probably be a lot more, and looping over them doesn't seem like the best way to achieve this. I'm wondering if anyone could lend a scratch to this itch!

+2  A: 

I think you should look at using values_list to get the person's specialties

Replace:

[s.name for s in person.specialties]

with:

person.specialties.values_list('name', flat=True)

That will give you a plain list (ie. ['spec1', 'spec2', ...]) which you can use again. And the sql query used in the bg will also be faster because it will only select 'name' instead of doing a select * to populate the ORM objects

You might also get a speed improvement by filtering jobs that the person definately can NOT perform:

so replace:

jobs = Job.objects.all()

with (2 queries - works for django 1.0+)

person_specialties = person.specialties.values_list('id', flat=True)
jobs = Job.objects.filter(required_specialties__id__in=person_specialties)

or with (1 query? - works for django1.1+)

jobs = Job.objects.filter(required_specialties__in=person.specialties.all())

You may also get an improvement by using select_related() on your jobs/person queries (since they have a foreign key that you're using)

Jiaaro
with Django 1.1 you can simply do jobs = Job.objects.filter(required_specialties__in=person.specialties.all())
Evgeny
@Evgeny I thought about that, does it do the same query as what I have here? I wasn't sure
Jiaaro
the documentation says that it will for i beleive for 1.1, but for earlier do this: jobs = Job.objects.filter(required_specialties__in=person.specialties.all().query)
Evgeny
Wow -- many thanks! Sad there isn't a 'magic' way to do those, but your suggestions are a lifesaver here.
rlafuente
@Evgeny according to the django docs, some db backends (like mysql) are not very fast with the subquery method http://docs.djangoproject.com/en/dev/ref/models/querysets/#in
Jiaaro
@Jim, good to know thanks.
Evgeny
+1  A: 

Another Idea

Ok I guess I should have added this to the other answer, but when I started on it, it seemed like it was going to be a different direction haha

No need to iterate:

person_specialties = person.specialties.values_list('pk', flat=True)

non_specialties = Specialties.objects.exclude(pk__in=person_specialties)

jobs = Job.objects.exclude(required_specialties__in=non_specialties)

note: I don't know exactly how fast this is. You may be better off with my other suggestions.
Also: This code is untested

Jiaaro
This code works perfectly, and it's noticeably faster than my attempt. Many thanks for all your suggestions, i sure learned a lot here.
rlafuente
I'm marking this answer as accepted since it's exactly what my question was about; even though your other answer suggests many other performance improvements.
rlafuente
Thanks! Happy to help =)
Jiaaro