ansaurus

Question

Django models - how to filter out duplicate values by PK after the fact?

Answer 1

A:

If the order doesn't matter, use a dict.

ionut bizau 2009-04-13 16:24:46

Answer 2

+4 A:

Is there a quick and easy way to do this? I'm considering using a dict instead of a list with the id as the key.

That's exactly what I would do if you were locked into your current structure of making several queries. Then a simply dictionary.values() will return your list back.

If you have a little more flexibility, why not use Q objects? Instead of actually making the queries, store each query in a Q object and use a bitwise or ("|") to execute a single query. This will achieve your goal and save database hits.

Django Q objects

David Berger 2009-04-13 16:27:26

Is this how you would you grab the objects when the foreign key ids are coming from multiple tables (kind of like an IN query with multiple parts)?

wsorenson 2009-04-13 16:32:17

Really not so sure what you mean by that. I'm not sure how you need it or what the foreign key ids are referring to.

David Berger 2009-04-13 17:07:55

I was able to use Q objects to retrieve the objects from multiple tables and spanning relationships.

wsorenson 2009-04-13 18:39:19

Answer 3

+3 A:

You can use a set if you add the __hash__ function to your model definition so that it returns the id (assuming this doesn't interfere with other hash behaviour you may have in your app):

class MyModel(models.Model):

    def __hash__(self):
        return self.pk

Jarret Hardie 2009-04-13 16:58:21

Answer 4

A:

Remove "duplicates" depends on how you define "duplicated".

If you want EVERY column (except the PK) to match, that's a pain in the neck -- it's a lot of comparing.

If, on the other hand, you have some "natural key" column (or short set of columns) than you can easily query and remove these.

master = MyModel.objects.get( id=theMasterKey )
dups = MyModel.objects.filter( fld1=master.fld1, fld2=master.fld2 )
dups.all().delete()

If you can identify some shorter set of key fields for duplicate identification, this works pretty well.

Edit

If the model objects haven't been saved to the database yet, you can make a dictionary on a tuple of these keys.

unique = {}
...
key = (anObject.fld1,anObject.fld2)
if key not in unique:
    unique[key]= anObject

S.Lott 2009-04-13 17:44:12

Answer 5

+2 A:

In general it's better to combine all your queries into a single query if possible. Ie.

q = Model.objects.filter(Q(field1=f1)|Q(field2=f2))

instead of

q1 = Models.object.filter(field1=f1)
q2 = Models.object.filter(field2=f2)

If the first query is returning duplicated Models then use distinct()

q = Model.objects.filter(Q(field1=f1)|Q(field2=f2)).distinct()

If your query really is impossible to execute with a single command, then you'll have to resort to using a dict or other technique recommended in the other answers. It might be helpful if you posted the exact query on SO and we could see if it would be possible to combine into a single query. In my experience, most queries can be done with a single queryset.

2009-04-14 13:40:00

I was able to use the Q objects and distinct() -- thanks.

wsorenson 2009-04-14 17:30:07

Answer 6

A:

I use this one:

dict(zip(map(lambda x: x.pk,items),items)).values()

kostas 2010-02-22 19:08:06

ansaurus

tags:

views:

answers:

Django models - how to filter out duplicate values by PK after the fact?

related questions