views:

1683

answers:

4

Ok, I'm stuck, need some help from here on...

If I've got a main dictionary like this:

data = [ {"key1": "value1", "key2": "value2", "key1": "value3"},  
{"key1": "value4", "key2": "value5", "key1": "value6"}, 
{"key1": "value1", "key2": "value8", "key1": "value9"} ]

Now, I need to go through that dictionary already to format some of the data, ie:

for datadict in data:  
    for key, value in datadict.items():  
    ...filter the data...

Now, how would I in that same loop somehow (if possible... if not, suggest alternatives please) check for values of certain keys, and if those values match my presets then I would add that whole list to another dictionary, thus effectively creating smaller dictionaries as I go along out of this main dictionary based on certain keys and values?

So, let's say I want to create a sub-dictionary with all the lists in which key1 has value of "value1", which for the above list would give me something like this:

subdata = [ {"key1": "value1", "key2": "value2", "key1": "value3"},  
{"key1": "value1", "key2": "value8", "key1": "value9"} ]
+1  A: 

The answer is too simple, so I guess we are missing some information. Anyway:

result = []
for datadict in data:
    for key, value in datadict.items():
        thefiltering()

    if datadict.get('matchkey') == 'matchvalue':
        result.append(datadict)

Also, you "main dictionary" is not a dictionary but a list. Just wanted to clear that up.

Lennart Regebro
+6  A: 

Here is a not so pretty way of doing it. The result is a generator, but if you really want a list you can surround it with a call to list(). Mostly it doesn't matter.

The predicate is a function which decides for each key/value pair if a dictionary in the list is going to cut it. The default one accepts all. If no k/v-pair in the dictionary matches it is rejected.

def filter_data(data, predicate=lambda k, v: True):
    for d in data:
         for k, v in d.items():
               if predicate(k, v):
                    yield d


test_data = [{"key1":"value1", "key2":"value2"}, {"key1":"blabla"}, {"key1":"value1", "eh":"uh"}]
list(filter_data(test_data, lambda k, v: k == "key1" and v == "value1"))
# [{'key2': 'value2', 'key1': 'value1'}, {'key1': 'value1', 'eh': 'uh'}]
Skurmedel
"not so pretty"? Disagree. This is very nice.
S.Lott
Thank you :). I tend to think stair case functions like that are ugly.
Skurmedel
@Skurmedel: Your function is elegant and it's easy to see how it does the job in simple steps; it saves the readers having to parse a complicated one-liner in their heads.
John Machin
Wow, that's pretty much exactly what I was looking for... and I have to disagree on the 'not so pretty' comment too.
Crazy Serb
+2  A: 

Net of the issues already pointed out in other comments and answers (multiple identical keys can't be in a dict, etc etc), here's how I'd do it:

def select_sublist(list_of_dicts, **kwargs):
    return [d for d in list_of_dicts 
            if all(d.get(k)==kwargs[k] for k in kwargs)]

subdata = select_sublist(data, key1='value1')
Alex Martelli
A: 

Inspired by the answer of Skurmedal, I split this into a recursive scheme to work with a database of nested dictionaries. In this case, a "record" is the subdictionary at the trunk. The predicate defines which records we are after -- those that match some (key,value) pair where these pairs may be deeply nested.

def filter_dict(the_dict, predicate=lambda k, v: True):
    for k, v in the_dict.iteritems():
        if isinstance(v, dict) and _filter_dict_sub(predicate, v):
            yield k, v

def _filter_dict_sub(predicate, the_dict):
    for k, v in the_dict.iteritems():
        if isinstance(v, dict) and filter_dict_sub(predicate, v):
            return True
        if predicate(k, v):
            return True
    return False

Since this is a generator, you may need to wrap with dict(filter_dict(the_dict)) to obtain a filtered dictionary.