views:

188

answers:

3

A dict

dic = {
 1: 'a', 
 2: 'a', 
 3: 'b', 
 4: 'a', 
 5: 'c', 
 6: 'd', 
 7: 'd', 
 8: 'a', 
 9: 'a'}

I want to remove duplicate values just keep one K/V pair, Regarding the "key" selection of those duplicated values, it may be max or min or by random select one of those duplicated item's key.

I do not want to use a k/v swap since that can not control the key selection.

Take value "a" for example

 1: 'a', 
 2: 'a', 
 4: 'a', 
 8: 'a', 
 9: 'a'

the max key will be {9: 'a'} and the min will be {1: 'a'}, and the random will choise any one of it.

And, if the key is other kind of hashable value, for example, string, then how to do such a selection?

Can anyone share me an idea?

Thanks!

+2  A: 
import itertools as it

newdic = {}
for v, grp in it.groupby(sorted((v, k) for k, v in dic.items)):
  newdic[min(k for _, k in grp)] = v

Or other "selection" functions in lieu of min (which, of course, does work fine even if keys are strings -- will give you the "lexically first" key in that case).

The one case in which the selection function needs some care is when the keys corresponding to the same value may be non-comparable (e.g., complex numbers, or, in Python 3, objects of different not-all-numeric types). Nothing a key= in the min won't cure;-).

Alex Martelli
thanks Alex, i am still in trying to findout how to random choise the duplicated key in your way.
K. C
@K.C., `random.choice(list(grp))[1]` is simplest (there are of course algorithms with better big-O for this purpose, but there's no point deploying them unless your groups of duplicates keys grow into many thousands of cases per key;-).
Alex Martelli
A: 

This will give you a randomly selected unique key:

In [29]: dic
Out[29]: {1: 'a', 2: 'a', 3: 'b', 4: 'a', 5: 'c', 6: 'd', 7: 'd', 8: 'a', 9: 'a'}

In [30]: dict((v,k) for k,v in dic.iteritems())
Out[30]: {'a': 9, 'b': 3, 'c': 5, 'd': 7}

In [31]: dict((v,k) for k,v in dict((v,k) for k,v in dic.iteritems()).iteritems())
Out[31]: {3: 'b', 5: 'c', 7: 'd', 9: 'a'}
unutbu
can you explain it why iteritem return random
K. C
@Registered: Python dicts are unordered. Thus, the order in which the key-value pairs are emitted from dic.iteritems() is undetermined. I should have said "undetermined" rather than "random".
unutbu
@Registered: I missed the fact that you requested a method that does not use a key-value swap. Sorry -- that's exactly what I did above. I'll leave this up for you to read, then delete in a day or so.
unutbu
@~ubuntu, you do not have to delete it, because it is a good way I never know before.
K. C
+2  A: 

You could build a reverse dictionary where the values are lists of all the keys from your initial dictionary. Using this you could then do what you want, min, max, random, alternate min and max, or whatever.

from collections import defaultdict

d = defaultdict(list)
for k,v in dic.iteritems():
    d[v].append(k)

print d
# {'a': [1, 2, 4, 8, 9], 'c': [5], 'b': [3], 'd': [6, 7]}
tom10
very easy to understand for those value is hashable, thanks!
K. C