views:

1663

answers:

5

Hello,

Newbie question here, so please bear with me.

Let's say I have a dictionary looking like this:

a = {"2323232838": ("first/dir", "hello.txt"),
     "2323221383": ("second/dir", "foo.txt"),
     "3434221": ("first/dir", "hello.txt"),
     "32232334": ("first/dir", "hello.txt"),
     "324234324": ("third/dir", "dog.txt")}

I want all values that are equal to each other to be moved into another dictionary.

matched = {"2323232838": ("first/dir", "hello.txt"),
           "3434221":    ("first/dir", "hello.txt"),
           "32232334":   ("first/dir", "hello.txt")}

And the remaining unmatched items should be looking like this:

remainder = {"2323221383": ("second/dir", "foo.txt"),
             "324234324":  ("third/dir", "dog.txt")}

Thanks in advance, and if you provide an example, please comment it as much as possible.

+1  A: 

Iterating over a dictionary is no different from iterating over a list in python:

for key in dic:
    print("dic[%s] = %s" % (key, dic[key]))

This will print all of the keys and values of your dictionary.

Avihu Turzion
While you're right, this was handled in the comments, and doesn't answer his question, which was deducible.
Triptych
+1  A: 

I assume that your unique id will be the key.
Probably not very beautiful, but returns a dict with your unique values:

>>> dict_ = {'1': ['first/dir', 'hello.txt'],
'3': ['first/dir', 'foo.txt'], 
'2': ['second/dir', 'foo.txt'], 
'4': ['second/dir', 'foo.txt']}  
>>> dict((v[0]+v[1],k) for k,v in dict_.iteritems())  
{'second/dir/foo.txt': '4', 'first/dir/hello.txt': '1', 'first/dir/foo.txt': '3'}

I've seen you updated your post:

>>> a
{'324234324': ('third/dir', 'dog.txt'), 
'2323221383': ('second/dir', 'foo.txt'), 
'3434221': ('first/dir', 'hello.txt'), 
'2323232838': ('first/dir', 'hello.txt'), 
'32232334': ('first/dir', 'hello.txt')}
>>> dict((v[0]+"/"+v[1],k) for k,v in a.iteritems())
{'second/dir/foo.txt': '2323221383', 
'first/dir/hello.txt': '32232334', 
'third/dir/dog.txt': '324234324'}
buster
that's not what OP has asked for at all.
SilentGhost
As yours isn't, too.The OP had some different version in the beginning which confused me.Tryptichs version seems to be alright, though.
buster
+8  A: 

The code below will result in two variables, matches and remainders. matches is an array of dictionaries, in which matching items from the original dictionary will have a corresponding element. remainder will contain, as in your example, a dictionary containing all the unmatched items.

Note that in your example, there is only one set of matching values: ('first/dir', 'hello.txt'). If there were more than one set, each would have a corresponding entry in matches.

import itertools

# Original dict
a = {"2323232838": ("first/dir", "hello.txt"),
     "2323221383": ("second/dir", "foo.txt"),
     "3434221": ("first/dir", "hello.txt"),
     "32232334": ("first/dir", "hello.txt"),
     "324234324": ("third/dir", "dog.txt")}

# Convert dict to sorted list of items
a = sorted(a.items(), key=lambda x:x[1])

# Group by value of tuple
groups = itertools.groupby(a, key=lambda x:x[1])

# Pull out matching groups of items, and combine items   
# with no matches back into a single dictionary
remainder = []
matched   = []

for key, group in groups:
   group = list(group)
   if len(group) == 1:
      remainder.append( group[0] )
   else:
      matched.append( dict(group) )
else:
   remainder = dict(remainder)

Output:

>>> matched
[
  {
    '3434221':    ('first/dir', 'hello.txt'), 
    '2323232838': ('first/dir', 'hello.txt'), 
    '32232334':   ('first/dir', 'hello.txt')
  }
]

>>> remainder
{
  '2323221383': ('second/dir', 'foo.txt'), 
  '324234324':  ('third/dir', 'dog.txt')
}

As a newbie, you're probably being introduced to a few unfamiliar concepts in the code above. Here are some links:

Triptych
nice. I can see now that i misunderstood the question with my answer.Anyway, looks good to me :)
buster
Thank you, i will need to read up on groups, but that's all good, thanks a million. Also thanks for editing my question!
Note, len(group) is 1 should read len(group) == 1. While the identity test ("is") works here in cPython due to small integer caching, it's a bad habit to get into. You want an equality test.
Ned Deily
A: 

if you know what value you want to filter out:

known_tuple = 'first/dir','hello.txt'
b = {k:v for k, v in a.items() if v == known_tuple}

then a would become:

a = dict(a.items() - b.items())

this is py3k notation, but I'm sure something similar can be implemented in legacy versions. If you don't know what the known_tuple is, then you'd need to first find it out. for example like this:

c = list(a.values())
for i in set(c):
    c.remove(i)
known_tuple = c[0]
SilentGhost
No, it can very well be "third/dir", "something.txt", i don't know.
+4  A: 

What you're asking for is called an "Inverted Index" -- the distinct items are recorded just once with a list of keys.

>>> from collections import defaultdict
>>> a = {"2323232838": ("first/dir", "hello.txt"),
...      "2323221383": ("second/dir", "foo.txt"),
...      "3434221": ("first/dir", "hello.txt"),
...      "32232334": ("first/dir", "hello.txt"),
...      "324234324": ("third/dir", "dog.txt")}
>>> invert = defaultdict( list )
>>> for key, value in a.items():
...     invert[value].append( key )
... 
>>> invert
defaultdict(<type 'list'>, {('first/dir', 'hello.txt'): ['3434221', '2323232838', '32232334'], ('second/dir', 'foo.txt'): ['2323221383'], ('third/dir', 'dog.txt'): ['324234324']})

The inverted dictionary has the original values associated with a list of 1 or more keys.

Now, to get your revised dictionaries from this.

Filtering:

>>> [ invert[multi] for multi in invert if len(invert[multi]) > 1 ]
[['3434221', '2323232838', '32232334']]
>>> [ invert[uni] for uni in invert if len(invert[uni]) == 1 ]
[['2323221383'], ['324234324']]

Expanding

>>> [ (i,multi) for multi in invert if len(invert[multi]) > 1 for i in invert[multi] ]
[('3434221', ('first/dir', 'hello.txt')), ('2323232838', ('first/dir', 'hello.txt')), ('32232334', ('first/dir', 'hello.txt'))]
>>> dict( (i,multi) for multi in invert if len(invert[multi]) > 1 for i in invert[multi] )
{'3434221': ('first/dir', 'hello.txt'), '2323232838': ('first/dir', 'hello.txt'), '32232334': ('first/dir', 'hello.txt')}

A similar (but simpler) treatment works for the items which occur once.

S.Lott
Huh, very simple, gotta use the python standard lib. more, thanks for this.
Ah, nice, too.It's amazing what you can do with simple standard calls :)
buster