views:

896

answers:

9

Hello,

I receive a dictionary as input, and want to return a list of keys for which the dictionary values are unique in the scope of that dictionary.

I will clarify with an example. Say my input is dictionary a, constructed as follows:

a = dict()
a['cat'] =      1
a['fish'] =     1
a['dog'] =      2  # <-- unique
a['bat'] =      3
a['aardvark'] = 3
a['snake'] =    4  # <-- unique
a['wallaby'] =  5
a['badger'] =   5

The result I expect is ['dog', 'snake'].

There are obvious brute force ways to achieve this, however I wondered if there's a neat Pythonian way to get the job done.

+4  A: 

Note that this actually is a bruteforce:

l = a.values()
b = [x for x in a if l.count(a[x]) == 1]
Bartosz Radaczyński
it will not output ['dog', 'snake']
Anurag Uniyal
Isn't l.count('dog') zero? l is [3, 3, 2, 1, 4, 5, 1, 5] on my system.
Paul Stephenson
ok, i see that cobbal has already corrected the code. thanks.
Bartosz Radaczyński
A: 

You could do something like this (just count the number of occurrences for each value):

def unique(a):
    from collections import defaultdict
    count = defaultdict(lambda: 0)
    for k, v in a.iteritems():
        count[v] += 1
    for v, c in count.iteritems():
        if c <= 1:
            yield v
Alex Morega
This is yielding values (2, 4) when it should yield keys ('dog', 'snake').
John Machin
I find `defaultdict(int)` to be a bit more clear than `defaultdict(lambda:0)`. Since a default dict of almost any other type will simply use the type name.
S.Lott
Ah, yielding wrong value, sorry.
Alex Morega
+8  A: 

i think efficient way if dict is too large would be

countMap = {}
for k, v in a.iteritems():
    countMap[v] = countMap.get(v,0) + 1
uni = [ k for k, v in a.iteritems() if countMap[v] == 1]
Anurag Uniyal
It would be prettier with collections.defaultdict(int), IMO
Ryan Ginstrom
yes, but I would leave it so that people know what we use to do when there were no defaultdicts
Anurag Uniyal
WASTEFUL: does `for k, v in a.iteritems():` but doesn't use k!!!
John Machin
+3  A: 
>>> b = []
>>> import collections
>>> bag = collections.defaultdict(lambda: 0)
>>> for v in a.itervalues():
...     bag[v] += 1
...
>>> b = [k for (k, v) in a.iteritems() if bag[v] == 1]
>>> b.sort() # optional
>>> print b
['dog', 'snake']
>>>
John Machin
collections.defaultdict(int) will also work
Ryan Ginstrom
@Ryan: True but `lambda: 0` is more explicit than `int` ... AFAICT, until defaultdict arrived [2.5], the number of persons who knew that int() produced 0 [since 2.2] instead of an exception was < epsilon and the number who had a use for that knowledge was even smaller :-)
John Machin
A: 

Use nested list comprehensions!

print [v[0] for v in 
           dict([(v, [k for k in a.keys() if a[k] == v])
                     for v in set(a.values())]).values()
       if len(v) == 1]
Greg Bacon
I don't see how using list comprehensions in this way is a win. For me, this just makes the solution harder to comprehend (no pun intended). Readability is key and this solution is just not that readable IMO.
Bryan Oakley
Rax asked for "a neat Pythonian way to get the job done," as opposed to "obvious" solutions to an otherwise trivial problem.
Greg Bacon
(1) Use `k in a` instead of `k in a.keys()` (2) Use `whatever.itervalues()` instead of `whatever.values()` (3) The dict(yadda yadda) part is building the already-overkill inverse of `a` inefficiently (4) It's neither neat nor Python(ic|ian) ... but it's certainly not obvious! (5) Count the number of responders whose first effort at the so-called trivial problem was a stuff-up.
John Machin
-1 Inefficient O(N^2), complex, unreadable
Tom Leys
This `solution` can be edited (using only the delete key!) to get rid of building the inverse; still O(N^2) though: `print [v[0] for v in [[k for k in a if a[k] == v] for v in set(a.values())] if len(v) == 1]`
John Machin
A: 

Here's another variation.

>>> import collections
>>> inverse= collections.defaultdict(list)
>>> for k,v in a.items():
...     inverse[v].append(k)
... 
>>> [ v[0] for v in inverse.values() if len(v) == 1 ]
['dog', 'snake']

I'm partial to this because the inverted dictionary is such a common design pattern.

S.Lott
You want [ v[0] for k,v ...] in the last line to get ['dog', 'snake'] as requested.
Paul Stephenson
(1) Instead of .items(), use .iteritems(). (2) The last line extracts the key needlessly; should be `[v[0] for v in inverse.itervalues() if len(v) == 1` (3) In any case building the whole inverted dict is overkill.
John Machin
+3  A: 

Here is a solution that only requires traversing the dict once:

def unique_values(d):
    seen = {} # dict (value, key)
    result = set() # keys with unique values
    for k,v in d.iteritems():
        if v in seen:
            result.discard(seen[v])
        else:
            seen[v] = k
            result.add(k)
    return list(result)
Rick Copeland
If a value occurs 3 times, you will try to remove a non-existent element from `result` ... docs say """remove(elem) Remove element elem from the set. Raises KeyError if elem is not contained in the set."""
John Machin
Right you are! I have corrected it to use discard() instead.
Rick Copeland
+2  A: 

A little more verbose, but does need only one pass over a:

revDict = {}
for k, v in a.iteritems():
  if v in revDict:
     revDict[v] = None
  else:
     revDict[v] = k

[ x for x in revDict.itervalues() if x != None ]

( I hope it works, since I can't test it here )

Juergen
Doesn't work if one of the dictionary keys is None. For example if a is {None: 1} the output should be [None] but the above code will produce []. Also: `x is not None` is preferable to `x != None`.
John Machin
Thanks for the comment! You are totally right. In praxis, it seldom happens that None is used ... but even then, one could create some DummyObject: "Dummy = object()" instead of using None.
Juergen
+2  A: 

What about subclassing?

class UniqueValuesDict(dict):

    def __init__(self, *args):
        dict.__init__(self, *args)
        self._inverse = {}

    def __setitem__(self, key, value):
        if value in self.values():
            if value in self._inverse:
                del self._inverse[value]
        else:
            self._inverse[value] = key
        dict.__setitem__(self, key, value)

    def unique_values(self):
        return self._inverse.values()

a = UniqueValuesDict()

a['cat'] =      1
a['fish'] =     1
a[None] =       1
a['duck'] =     1
a['dog'] =      2  # <-- unique
a['bat'] =      3
a['aardvark'] = 3
a['snake'] =    4  # <-- unique
a['wallaby'] =  5
a['badger'] =   5

assert a.unique_values() == ['dog', 'snake']
fabrizioM
This has the advantage of a smaller memory footprint, but you end up doing an O(N) search every time you set an item, so it's liable to be a lot slower than the dictionary tabulation method. Also, I think you could use a set for _inverse instead of a dict.
Ryan Ginstrom
Another problem: the OP placed no constraints on how the contents of the dict were obtained. So one would expect that `del a['bat']; print a.unique_values()` would cause `aardvark` to appear in the output but sadly it doesn't and fixing that would require even more convolutions and __double__underscores__ :-(
John Machin