views:

1499

answers:

6

I build a list of Django model objects by making several queries. Then I want to remove any duplicates, (all of these objects are of the same type with an auto_increment int PK), but I can't use set() because they aren't hashable.

Is there a quick and easy way to do this? I'm considering using a dict instead of a list with the id as the key.

A: 

If the order doesn't matter, use a dict.

ionut bizau
+4  A: 

Is there a quick and easy way to do this? I'm considering using a dict instead of a list with the id as the key.

That's exactly what I would do if you were locked into your current structure of making several queries. Then a simply dictionary.values() will return your list back.

If you have a little more flexibility, why not use Q objects? Instead of actually making the queries, store each query in a Q object and use a bitwise or ("|") to execute a single query. This will achieve your goal and save database hits.

Django Q objects

David Berger
Is this how you would you grab the objects when the foreign key ids are coming from multiple tables (kind of like an IN query with multiple parts)?
wsorenson
Really not so sure what you mean by that. I'm not sure how you need it or what the foreign key ids are referring to.
David Berger
I was able to use Q objects to retrieve the objects from multiple tables and spanning relationships.
wsorenson
+3  A: 

You can use a set if you add the __hash__ function to your model definition so that it returns the id (assuming this doesn't interfere with other hash behaviour you may have in your app):

class MyModel(models.Model):

    def __hash__(self):
        return self.pk
Jarret Hardie
A: 

Remove "duplicates" depends on how you define "duplicated".

If you want EVERY column (except the PK) to match, that's a pain in the neck -- it's a lot of comparing.

If, on the other hand, you have some "natural key" column (or short set of columns) than you can easily query and remove these.

master = MyModel.objects.get( id=theMasterKey )
dups = MyModel.objects.filter( fld1=master.fld1, fld2=master.fld2 )
dups.all().delete()

If you can identify some shorter set of key fields for duplicate identification, this works pretty well.


Edit

If the model objects haven't been saved to the database yet, you can make a dictionary on a tuple of these keys.

unique = {}
...
key = (anObject.fld1,anObject.fld2)
if key not in unique:
    unique[key]= anObject
S.Lott
+2  A: 

In general it's better to combine all your queries into a single query if possible. Ie.

q = Model.objects.filter(Q(field1=f1)|Q(field2=f2))

instead of

q1 = Models.object.filter(field1=f1)
q2 = Models.object.filter(field2=f2)

If the first query is returning duplicated Models then use distinct()

q = Model.objects.filter(Q(field1=f1)|Q(field2=f2)).distinct()

If your query really is impossible to execute with a single command, then you'll have to resort to using a dict or other technique recommended in the other answers. It might be helpful if you posted the exact query on SO and we could see if it would be possible to combine into a single query. In my experience, most queries can be done with a single queryset.

I was able to use the Q objects and distinct() -- thanks.
wsorenson
A: 

I use this one:

dict(zip(map(lambda x: x.pk,items),items)).values()
kostas