views:

1196

answers:

5

This is a part algorithm-logic question (how to do it), part implementation question (how to do it best!). I'm working with Django, so I thought I'd share with that.

In Python, it's worth mentioning that the problem is somewhat related to how-do-i-use-pythons-itertoolsgroupby.

Suppose you're given two Django Model-derived classes:

from django.db import models

class Car(models.Model):
    mods = models.ManyToManyField(Representative)

and

from django.db import models

class Mods(models.Model):
   ...

How does one get a list of Cars, grouped by Cars with a common set of Mods?

I.e. I want to get a class likeso:

Cars_by_common_mods = [ 
  { mods: { 'a' }, cars: { 'W1', 'W2' } },
  { mods: { 'a', 'b' }, cars: { 'X1', 'X2', 'X3' }, },
  { mods: { 'b' }, cars: { 'Y1', 'Y2' } },
  { mods: { 'a', 'b', 'c' }, cars: { 'Z1' } },
]

I've been thinking of something like:

def cars_by_common_mods():
  cars = Cars.objects.all()

  mod_list = []      

  for car in cars:
    mod_list.append( { 'car': car, 'mods': list(car.mods.all()) } 

  ret = []

  for key, mods_group in groupby(list(mods), lambda x: set(x.mods)):
    ret.append(mods_group)

  return ret

However, that doesn't work because (perhaps among other reasons) the groupby doesn't seem to group by the mods sets. I guess the mod_list has to be sorted to work with groupby. All to say, I'm confident there's something simple and elegant out there that will be both enlightening and illuminating.

Cheers & thanks!

+1  A: 

check regroup. it's only for templates, but i guess this kind of classification belongs to the presentation layer anyway.

Javier
Thanks for the reply. I looked at regroup, but the (unstated) problem is that there is more logic to be done after the initial groupings. It's a good tip, though; will see if I can design it around regroup.
Brian M. Hunt
+3  A: 

Have you tried sorting the list first? The algorithm you proposed should work, albeit with lots of database hits.

import itertools

cars = [
    {'car': 'X2', 'mods': [1,2]},
    {'car': 'Y2', 'mods': [2]},
    {'car': 'W2', 'mods': [1]},
    {'car': 'X1', 'mods': [1,2]},
    {'car': 'W1', 'mods': [1]},
    {'car': 'Y1', 'mods': [2]},
    {'car': 'Z1', 'mods': [1,2,3]},
    {'car': 'X3', 'mods': [1,2]},
]

cars.sort(key=lambda car: car['mods'])

cars_by_common_mods = {}
for k, g in itertools.groupby(cars, lambda car: car['mods']):
    cars_by_common_mods[frozenset(k)] = [car['car'] for car in g]

print cars_by_common_mods

Now, about those queries:

import collections
import itertools
from operator import itemgetter

from django.db import connection

cursor = connection.cursor()
cursor.execute('SELECT car_id, mod_id FROM someapp_car_mod ORDER BY 1, 2')
cars = collections.defaultdict(list)
for row in cursor.fetchall():
    cars[row[0]].append(row[1])

# Here's one I prepared earlier, which emulates the sample data we've been working
# with so far, but using the car id instead of the previous string.
cars = {
    1: [1,2],
    2: [2],
    3: [1],
    4: [1,2],
    5: [1],
    6: [2],
    7: [1,2,3],
    8: [1,2],
}

sorted_cars = sorted(cars.iteritems(), key=itemgetter(1))
cars_by_common_mods = []
for k, g in itertools.groupby(sorted_cars, key=itemgetter(1)):
    cars_by_common_mods.append({'mods': k, 'cars': map(itemgetter(0), g)})

print cars_by_common_mods

# Which, for the sample data gives me (reformatted by hand for clarity)
[{'cars': [3, 5],    'mods': [1]},
 {'cars': [1, 4, 8], 'mods': [1, 2]},
 {'cars': [7],       'mods': [1, 2, 3]},
 {'cars': [2, 6],    'mods': [2]}]

Now that you've got your lists of car ids and mod ids, if you need the complete objects to work with, you could do a single query for each to get a complete list for each model and create a lookup dict for those, keyed by their ids - then, I believe, Bob is your proverbial father's brother.

insin
+1  A: 

You have a few problems here.

You didn't sort your list before calling groupby, and this is required. From itertools documentation:

Generally, the iterable needs to already be sorted on the same key function.

Then, you don't duplicate the list returned by groupby. Again, documentation states:

The returned group is itself an iterator that shares the underlying iterable with groupby(). Because the source is shared, when the groupby object is advanced, the previous group is no longer visible. So, if that data is needed later, it should be stored as a list:

groups = []
uniquekeys = []
for k, g in groupby(data, keyfunc):
    groups.append(list(g))      # Store group iterator as a list
    uniquekeys.append(k)

And final mistake is using sets as keys. They don't work here. A quick fix is to cast them to sorted tuples (there could be a better solution, but I cannot think of it now).

So, in your example, the last part should look like this:

sortMethod = lambda x: tuple(sorted(set(x.mods)))
sortedMods = sorted(list(mods), key=sortMethod)
for key, mods_group in groupby(sortedMods, sortMethod):
    ret.append(list(mods_group))
DzinX
I return to this answer all the time. haha
Brian M. Hunt
+1  A: 

If performance is a concern (i.e. lots of cars on a page, or a high-traffic site), denormalization makes sense, and simplifies your problem as a side effect.

Be aware that denormalizing many-to-many relations might be a bit tricky though. I haven't run into any such code examples yet.

akaihola
A: 

Thank you all for the helpful replies. I've been plugging away at this problem. A 'best' solution still eludes me, but I've some thoughts.

I should mention that the statistics of the data-set I'm working with. In 75% of the cases there will be one Mod. In 24% of the cases, two. In 1% of the cases there will be zero, or three or more. For every Mod, there is at least one unique Car, though a Mod may be applied to numerous Cars.

Having said that, I've considered (but not implemented) something like-so:

class ModSet(models.Model):
  mods = models.ManyToManyField(Mod)

and change cars to

class Car(models.Model):
  modset = models.ForeignKey(ModSet)

It's trivial to group by Car.modset: I can use regroup, as suggested by Javier, for example. It seems a simpler and reasonably elegant solution; thoughts would be much appreciated.

Brian M. Hunt