views:

107

answers:

1

Hi all,

I have a problem with some code and I believe it is because of the expense of the queryset. I am looking for a much less expensive (in terms of time) way to to this..

log.info("Getting Users")
employees = Employee.objects.filter(is_active = True)
log.info("Have Users")

if opt.supervisor:
    if opt.hierarchical:
        people = getSubs(employees, " ".join(args))
    else:
        people = employees.filter(supervisor__name__icontains = " ".join(args))
else:
    log.info("Filtering Users")
    people = employees.filter(name__icontains = " ".join(args)) | \
         employees.filter(unix_accounts__username__icontains = " ".join(args))
    log.info("Filtered Users")

log.info("Processing data")

np = []
for person in people:
    unix, p4, bugz = "No", "No", "No"
    if len(person.unix_accounts.all()): unix = "Yes"
    if len(person.perforce_accounts.all()): p4 = "Yes"
    if len(person.bugzilla_accounts.all()): bugz = "Yes"
    if person.cell_phone != "": exphone = fixphone(person.cell_phone)
    elif person.other_phone != "": exphone = fixphone(person.other_phone)
    else: exphone = ""
    np.append({ 'name':person.name,
                 'office_phone': fixphone(person.office_phone),
                 'position': person.position,
                 'location': person.location.description,
                 'email': person.email,
                 'functional_area': person.functional_area.name,
                 'department': person.department.name,
                 'supervisor': person.supervisor.name,
                 'unix': unix, 'perforce': p4, 'bugzilla':bugz,
                 'cell_phone': fixphone(exphone),
                 'fax': fixphone(person.fax),
                 'last_update': person.last_update.ctime() })

log.info("Have data")

Now this results in a log which looks like this..

19:00:55 INFO     phone       phone Getting Users
19:00:57 INFO     phone       phone Have Users
19:00:57 INFO     phone       phone Processing data
19:01:30 INFO     phone       phone Have data

As you can see it's taking over 30 seconds to simply iterate over the data. That is way too expensive. Can someone clue me into a more efficient way to do this. I thought that if I did the first filter that would make things easier but seems to have no effect. I'm at a loss on this one.

Thanks

To be clear this is about 1500 employees -- Not too many!!

+3  A: 
  1. Or Q objects together instead of QuerySets.
  2. QuerySet.select_related()
  3. QuerySet.iterator()
  4. Use QuerySet.extra() to add IS NULL fields instead of the three len() calls in the loop.
Ignacio Vazquez-Abrams
You hit everything there. Does ...filer(Q(..) | Q(..)) actually do anything different than queryset1 | queryset2 in this case?
istruble
@istruble: I believe `QuerySet.__or__()` does a `UNION` of some sort, but I could be wrong.
Ignacio Vazquez-Abrams