views:

488

answers:

6

Hello,

quick and very basic newbie question.

If i have list of dictionaries looking like this:

L = []
L.append({"value1": value1, "value2": value2, "value3": value3, "value4": value4})

Let's say there exists multiple entries where value3 and value4 are identical to other nested dictionaries. How can i quick and easy find and remove those duplicate dictionaries.

Preserving order is of no importance.

Thanks.

EDIT:

If there are five inputs, like this:

L = [{"value1": fssd, "value2": dsfds, "value3": abcd, "value4": gk},
    {"value1": asdasd, "value2": asdas, "value3": dafdd, "value4": sdfsdf},
    {"value1": sdfsf, "value2": sdfsdf, "value3": abcd, "value4": gk},
    {"value1": asddas, "value2": asdsa, "value3": abcd, "value4": gk},
    {"value1": asdasd, "value2": dskksks, "value3": ldlsld, "value4": sdlsld}]

The output shoud look like this:

L = [{"value1": fssd, "value2": dsfds, "value3": abcd, "value4": gk},
    {"value1": asdasd, "value2": asdas, "value3": dafdd, "value4": sdfsdf},
    {"value1": asdasd, "value2": dskksks, "value3": ldlsld, "value4": sdlsld}
A: 
for dic in list: 
  for anotherdic in list:
    if dic != anotherdic:
      if dic["value3"] == anotherdic["value3"] or dic["value4"] == anotherdic["value4"]:
        list.remove(anotherdic)

Tested with

list = [{"value1": 'fssd', "value2": 'dsfds', "value3": 'abcd', "value4": 'gk'},
{"value1": 'asdasd', "value2": 'asdas', "value3": 'dafdd', "value4": 'sdfsdf'},
{"value1": 'sdfsf', "value2": 'sdfsdf', "value3": 'abcd', "value4": 'gk'},
{"value1": 'asddas', "value2": 'asdsa', "value3": 'abcd', "value4": 'gk'},
{"value1": 'asdasd', "value2": 'dskksks', "value3": 'ldlsld', "value4": 'sdlsld'}]

worked fine for me :)

wallacer
+1  A: 

That's a list of one dictionary and but, assuming there are more dictionaries in the list l:

l = [ldict for ldict in l if ldict.get("value3") != value3 or ldict.get("value4") != value4]

But is that what you really want to do? Perhaps you need to refine your description.

BTW, don't use list as a name since it is the name of a Python built-in.

EDIT: Assuming you started with a list of dictionaries, rather than a list of lists of 1 dictionary each that should work with your example. It wouldn't work if either of the values were None, so better something like:

l = [ldict for ldict in l if not ( ("value3" in ldict and ldict["value3"] == value3) and ("value4" in ldict and ldict["value4"] == value4) )]

But it still seems like an unusual data structure.

EDIT: no need to use explicit gets.

Also, there are always tradeoffs in solutions. Without more info and without actually measuring, it's hard to know which performance tradeoffs are most important for the problem. But, as the Zen sez: "Simple is better than complex".

Ned Deily
Hello Ned, thanks for your input, i have added an example on an INPUT and an OUTPUT of the same list, also, i have renamed the list, in that specific example. Thanks.
Jonas
+2  A: 

You can use a temporary array to store an items dict. The previous code was bugged for removing items in the for loop.

(v,r) = ([],[])
for i in l:
    if ('value4', i['value4']) not in v and ('value3', i['value3']) not in v:
        r.append(i)
    v.extend(i.items())
l = r

Your test:

l = [{"value1": 'fssd', "value2": 'dsfds', "value3": 'abcd', "value4": 'gk'},
    {"value1": 'asdasd', "value2": 'asdas', "value3": 'dafdd', "value4": 'sdfsdf'},
    {"value1": 'sdfsf', "value2": 'sdfsdf', "value3": 'abcd', "value4": 'gk'},
    {"value1": 'asddas', "value2": 'asdsa', "value3": 'abcd', "value4": 'gk'},
    {"value1": 'asdasd', "value2": 'dskksks', "value3": 'ldlsld', "value4": 'sdlsld'}]

ouputs

{'value4': 'gk', 'value3': 'abcd', 'value2': 'dsfds', 'value1': 'fssd'}
{'value4': 'sdfsdf', 'value3': 'dafdd', 'value2': 'asdas', 'value1': 'asdasd'}
{'value4': 'sdlsld', 'value3': 'ldlsld', 'value2': 'dskksks', 'value1': 'asdasd'}
ACoolie
Your output is not correct. Look at my example. Thanks anyhow for the attempt.
Jonas
+2  A: 

Here's one way:

keyfunc = lambda d: (d['value3'], d['value4'])

from itertools import groupby
giter = groupby(sorted(L, key=keyfunc), keyfunc)

L2 = [g[1].next() for g in giter]
print L2
ars
It looks like yours is correct and an hour earlier than Alex's.
hughdbrown
I guess it's easy to get missed once a question gets more than 5 or 6 answers. Probably helps to be in the first *or* last couple, I suspect. No biggie, but thanks for noting that. :)
ars
A: 

If I understand correctly, you want to discard matches that come later in the original list but do not care about the order of the resulting list, so:

(Tested with 2.5.2)

tempDict = {}
for d in L[::-1]:
    tempDict[(d["value3"],d["value4"])] = d
L[:] = tempDict.itervalues()
tempDict = None
Anon
Did you try running your code? It doesn't do what the OP asked for. A couple of questions: (1) why iterate through the list in reverse order? (2) why use (d["value3"],d["value4"]) as the key in your temporary dictionary? (3) why assign the current dictionary in the list during iteration as the value to your temporary dicitonary?
hughdbrown
Hrm - does what my interpretation was (which I was not sure about), and also matches his output - though not the order of it, but he said preserving that was of no importance. My interpretation: When more than one dictionary with the same (value3, value4) pair, keep only the first such dictionary from the original list. And, resulting list of dicts does not have to be in the same order. So... (1) so first intance in original list will "win" and be retained, (2) because I thought that's what had to be unique, and (3) because the dictionaries are the values I pull back out for the new list.
Anon
(In my test output, the dict items print in reverse order, and the list of dicts has them in a different order, but since he said "Preserving order is of no importance," that seemed within the parameters.)
Anon
Looking back over things, I stand by my interpretation. Order seems to be the only point of contention. Note that, if the OP's original data had, say, the instances of "abcd" replaced by "xkcd", the sort in Alex's answer (which rocks, as always) would also result in a different order. The question's random looking (and not even quoted) data gave no indication that its order was anything other than happenstance - again, particularly combined with "Preserving order is of no importance."
Anon
+3  A: 

In Python 2.6 or 3.*:

import itertools
import pprint

L = [{"value1": "fssd", "value2": "dsfds", "value3": "abcd", "value4": "gk"},
    {"value1": "asdasd", "value2": "asdas", "value3": "dafdd", "value4": "sdfsdf"},
    {"value1": "sdfsf", "value2": "sdfsdf", "value3": "abcd", "value4": "gk"},
    {"value1": "asddas", "value2": "asdsa", "value3": "abcd", "value4": "gk"},
    {"value1": "asdasd", "value2": "dskksks", "value3": "ldlsld", "value4": "sdlsld"}]

getvals = operator.itemgetter('value3', 'value4')

L.sort(key=getvals)

result = []
for k, g in itertools.groupby(L, getvals):
    result.append(g.next())

L[:] = result
pprint.pprint(L)

Almost the same in Python 2.5, except you have to use g.next() instead of next(g) in the append.

Alex Martelli
Thanks for this solution.
Jonas