views:

109

answers:

3

Hi folks,

Here is my problem: I have a list of Python dictionaries of identical form, that are meant to represent the rows of a table in a database, something like this:

[ {'ID': 1,
   'NAME': 'Joe',
   'CLASS': '8th',
   ... },
  {'ID': 1,
   'NAME': 'Joe',
   'CLASS': '11th',
   ... },
  ...]

I have already written a function to get the unique values for a particular field in this list of dictionaries, which was trivial. That function implements something like:

select distinct NAME from ...

However, I want to be able to get the list of multiple unique fields, similar to:

select distinct NAME, CLASS from ...

Which I am finding to be non-trivial. Is there an algorithm or Python included function to help me with this quandry?

Before you suggest loading the CSV files into a SQLite table or something similar, that is not an option for the environment I'm in, and trust me, that was my first thought.

+7  A: 

If you want it as a generator:

def select_distinct(dictionaries, keys):
  seen = set()
  for d in dictionaries:
    v = tuple(d[k] for k in keys)
    if v in seen: continue
    yield v
    seen.add(v)

if you want the result in some other form (e.g., a list instead of a generator) it's not hard to alter this (e.g., .append to the initially-empty result list instead of yielding, and return the result list at the end).

To be called, of course, as

for values_tuple in select_distinct(thedicts, ('NAME', 'CLASS')):
    ...

or the like.

Alex Martelli
A: 

distinct_list = list(set([(d['NAME'], d['CLASS']) for d in row_list]))

where row_list is the list of dicts you have

Florian Diesch
A: 
from collections import defaultdict

example = [ {'ID': 1, 'NAME': 'Joe','CLASS': '8th'},
            {'ID': 1, 'NAME': 'Joe', 'CLASS': '11th'}]

fields = ('ID', 'CLASS')

mydict = defaultdict(list)

for dictio in example:
    for field in fields:
        mydict[field].append(dictio[field])

print mydict

this gives:

defaultdict(<type 'list'>, {'ID': [1, 1], 'CLASS': ['8th', '11th']})
joaquin