views:

505

answers:

4

Ok, so I have a list of dicts:

[{'name': 'johnny', 'surname': 'smith', 'age': 53},
 {'name': 'johnny', 'surname': 'ryan', 'age': 13},
 {'name': 'jakob', 'surname': 'smith', 'age': 27},
 {'name': 'aaron', 'surname': 'specter', 'age': 22},
 {'name': 'max', 'surname': 'headroom', 'age': 108},
]

and I want the 'frequency' of the items within each column. So for this I'd get something like:

{'name': {'johnny': 2, 'jakob': 1, 'aaron': 1, 'max': 1}, 
'surname': {'smith': 2, 'ryan': 1, 'specter': 1, 'headroom': 1}, 
'age': {53:1, 13:1, 27: 1. 22:1, 108:1}}

Any modules out there that can do stuff like this?

+12  A: 

collections.defaultdict from the standard library to the rescue:

from collections import defaultdict

LofD = [{'name': 'johnny', 'surname': 'smith', 'age': 53},
 {'name': 'johnny', 'surname': 'ryan', 'age': 13},
 {'name': 'jakob', 'surname': 'smith', 'age': 27},
 {'name': 'aaron', 'surname': 'specter', 'age': 22},
 {'name': 'max', 'surname': 'headroom', 'age': 108},
]

def counters():
  return defaultdict(int)

def freqs(LofD):
  r = defaultdict(counters)
  for d in LofD:
    for k, v in d.items():
      r[k][v] += 1
  return dict((k, dict(v)) for k, v in r.items())

print freqs(LofD)

emits

{'age': {27: 1, 108: 1, 53: 1, 22: 1, 13: 1}, 'surname': {'headroom': 1, 'smith': 2, 'specter': 1, 'ryan': 1}, 'name': {'jakob': 1, 'max': 1, 'aaron': 1, 'johnny': 2}}

as desired (order of keys apart, of course -- it's irrelevant in a dict).

Alex Martelli
+1  A: 

This?

from collections import defaultdict
fq = { 'name': defaultdict(int), 'surname': defaultdict(int), 'age': defaultdict(int) }
for row in listOfDicts:
    for field in fq:
       fq[field][row[field]] += 1
print fq
S.Lott
+2  A: 
items = [{'name': 'johnny', 'surname': 'smith', 'age': 53},  {'name': 'johnny', 'surname': 'ryan', 'age': 13},  {'name': 'jakob', 'surname': 'smith', 'age': 27},  {'name': 'aaron', 'surname': 'specter', 'age': 22},  {'name': 'max', 'surname': 'headroom', 'age': 108}]

global_dict = {}

for item in items:
    for key, value in item.items():
        if not global_dict.has_key(key):
            global_dict[key] = {}

        if not global_dict[key].has_key(value):
            global_dict[key][value] = 0

        global_dict[key][value] += 1

print global_dict

Simplest solution and actually tested.

zdmytriv
That's probably how I would've eventually done it, had never heard of collections.defaultdict.
dochead
How's it simpler to duplicate the "if not has_key" logic that collections.defaultdict embodies? This is how I'd have done in 1.5.2 (before we added the simpler and faster idiom `if key not in global_dict` in 2.0) but "compatible with archaic versions" doesn't equate to "simple";-).
Alex Martelli
Simplest for newbies :)
zdmytriv
+1  A: 

New in Python 3.1: The collections.Counter class:

mydict=[{'name': 'johnny', 'surname': 'smith', 'age': 53},
 {'name': 'johnny', 'surname': 'ryan', 'age': 13},
 {'name': 'jakob', 'surname': 'smith', 'age': 27},
 {'name': 'aaron', 'surname': 'specter', 'age': 22},
 {'name': 'max', 'surname': 'headroom', 'age': 108},
]

import collections
newdict = {}

for key in mydict[0].keys():
    l = [value[key] for value in mydict]
    newdict[key] = dict(collections.Counter(l))

print(newdict)

outputs:

{'age': {27: 1, 108: 1, 53: 1, 22: 1, 13: 1}, 
'surname': {'headroom': 1, 'smith': 2, 'specter': 1, 'ryan': 1}, 
'name': {'jakob': 1, 'max': 1, 'aaron': 1, 'johnny': 2}}
Tim Pietzcker