views:

77

answers:

2

Hello,

I have a list of dicts and I want to compare each dict in that list with a dict in a resulting list, add it to the result list if it's not there, and if it's there, update a counter associated with that dict.

At first I wanted to use the solution described at http://stackoverflow.com/questions/1692388/python-list-of-dict-if-exists-increment-a-dict-value-if-not-append-a-new-dict but I got an error where one dict can not be used as a key to another dict.

So the data structure I opted for is a list where each entry is a dict and an int:

r = [[{'src': '', 'dst': '', 'cmd': ''}, 0]]

The original dataset (that should be compared to the resulting dataset) is a list of dicts:

d1 = {'src': '192.168.0.1',
      'dst': '192.168.0.2',
      'cmd': 'cmd1'}
d2 = {'src': '192.168.0.1',
      'dst': '192.168.0.2',
      'cmd': 'cmd2'}
d3 = {'src': '192.168.0.2',
      'dst': '192.168.0.1',
      'cmd': 'cmd1'}
d4 = {'src': '192.168.0.1',
      'dst': '192.168.0.2',
      'cmd': 'cmd1'}
o = [d1, d2, d3, d4]

The result should be:

r = [[{'src': '192.168.0.1', 'dst': '192.168.0.2', 'cmd': 'cmd1'}, 2],
     [{'src': '192.168.0.1', 'dst': '192.168.0.2', 'cmd': 'cmd2'}, 1],
     [{'src': '192.168.0.2', 'dst': '192.168.0.1', 'cmd': 'cmd1'}, 1]]

What is the best way to accomplish this? I have a few code examples but none is really good and most is not working correctly.

Thanks for any input on this!

UPDATE

The final code after Tamås comments is:

from collections import namedtuple, defaultdict
DataClass = namedtuple("DataClass", "src dst cmd")
d1 = DataClass(src='192.168.0.1', dst='192.168.0.2', cmd='cmd1')
d2 = DataClass(src='192.168.0.1', dst='192.168.0.2', cmd='cmd2')
d3 = DataClass(src='192.168.0.2', dst='192.168.0.1', cmd='cmd1')
d4 = DataClass(src='192.168.0.1', dst='192.168.0.2', cmd='cmd1')
ds = d1, d2, d3, d4
r = defaultdict(int)
for d in ds:
    r[d] += 1
print "list to compare"
for d in ds:
    print d
print "result after merge"
for k, v in r.iteritems():
    print("%s: %s" % (k, v))
+1  A: 

Well, if your original dicts contain only src, dst and cmd, you can use named tuples instead, which are hashable, so you can use named tuples in a dict as keys.

from collections import namedtuple

DataClass = namedtuple("DataClass", "src dst cmd")
d1 = DataClass(src='192.168.0.2', dst='192.168.0.1', cmd='cmd1')

(Sorry for the silly class name; since I don't know what your dicts represent, I couldn't come up with a better name). You can even create DataClass instances from dicts:

d1 = DataClass(**d1_as_dict)

At this point, your main counting loop simplifies to this:

from collections import defaultdict, namedtuple

r = defaultdict(int)
for obj in [d1, d2, d3, d4]:
    r[obj] += 1

If, for some reason, you are stuck with Python <= 2.5, there is a drop-in namedtuple replacement class here.

Tamás
Thanks for reminding me of namedtuple. I did not know it was hashable.
lmnt
+1  A: 

The namedtuple is an excellent idea, if applicable. But if you want to stick with dicts, that is also of course possible, just substantially less efficient. For example:

def addadict(r, newd):
  for i, (d, count) in enumerate(r):
    if d == newd:
      r[i] = [d, count+1]
      break
  else:
    r.append([newd, 1])
Alex Martelli