views:

102

answers:

3

I have a list of dictionaries:

people = [{"name": "Roger", "city": "NY", "age": 20, "sex": "M"},
          {"name": "Dan", "city": "Boston", "age": 20, "sex": "M"},
          {"name": "Roger", "city": "Boston", "age": 21, "sex": "M"},
          {"name": "Dana", "city": "Dallas", "age": 30, "sex": "F"}]

I want to catalogue them, for example I choose these keys:

field = ("sex", "age")

I need a function catalogue(field, people) that give me:

{ "M": 
      { 20: [{"name": "Roger", "city": "NY", "age": 20, "sex": "M"},
             {"name": "Dan", "city": "Boston", "age": 20, "sex": "M"}],
        21: [{"name": "Roger", "city": "Boston", "age": 21, "sex": "M"}]
      },
 { "F":
      { 30: [{"name": "Dana", "city": "Dallas", "age": 30, "sex": "F"}] }
 }

when len(field)==1 it's simple. I want to do something like this:

c = catalogue(field, people)
for (sex, sex_value) in c.iteritems():
   for (age, age_value) in sex_value.iteritems():
       print sex, age, age_value["name"]
+8  A: 

recursively:

import itertools, operator

def catalog(fields,people):
    cur_field = operator.itemgetter(fields[0])
    groups = itertools.groupby(sorted(people, key=cur_field),cur_field)
    if len(fields)==1:
        return dict((k,list(v)) for k,v in groups)
    else:
        return dict((k,catalog(fields[1:],v)) for k,v in groups)

test:

import pprint
pprint.pprint(catalog(('sex','age'), people))
{'F': {30: [{'age': 30, 'city': 'Dallas', 'name': 'Dana', 'sex': 'F'}]},
 'M': {20: [{'age': 20, 'city': 'NY', 'name': 'Roger', 'sex': 'M'},
            {'age': 20, 'city': 'Boston', 'name': 'Dan', 'sex': 'M'}],
       21: [{'age': 21, 'city': 'Boston', 'name': 'Roger', 'sex': 'M'}]}}
Jimmy
Never ever ever ever use `import *`.
Mike Graham
:) alright, fixed
Jimmy
One useful note is that you can create a lookup function using the operator.itemgetter factory function. ie replace the first line with `cur_field = operator.itemgetter(fields[0])` This looks a bit nicer, and is also slightly faster.
Brian
@Brian: thanks :) answer changed.
Jimmy
A: 
import pprint
people = [{"name": "Roger", "city": "NY", "age": 20, "sex": "M"},
          {"name": "Dan", "city": "Boston", "age": 20, "sex": "M"},
          {"name": "Roger", "city": "Boston", "age": 21, "sex": "M"},
          {"name": "Dana", "city": "Dallas", "age": 30, "sex": "F"}]
fields = ("sex", "age")
result = {}
for person in people:
    tempdict = result
    for field in fields[:-1]:
        if person[field] in tempdict:
            tempdict = tempdict[person[field]]
        else:
            t = tempdict
            tempdict = {}
            t[person[field]] = tempdict
    key = person[fields[-1]]
    if key in tempdict:
        tempdict[key].append(person)
    else:
        tempdict[key] = [person]

pprint.pprint(result)

seems do the job

Xavier Combelle
A: 

Not optimal (could be improved using defaultdict, for instance, but I had Python2.4 installed on my machine), but does the job:

def catalogue(dicts, criteria):
    if not criteria:
        return dicts

    criterion, rest = criteria[0], criteria[1:]

    cat = {}
    for d in dicts:
        reducedDict = dict(d)
        del reducedDict[criterion]

        if d[criterion] in cat:
            cat[d[criterion]].append(reducedDict)
        else:
            cat[d[criterion]] = [reducedDict]

    retDict = {}
    for key, val in cat.items():
        retDict[key] = catalogue(val, rest)

    return retDict

print catalogue(people, ("sex", "age"))
jellybean