views:

143

answers:

3

Hi all... I'm trying to wrap my brain around this but it's not flexible enough.

In my Python script I have a dictionary of dictionaries of lists. (Actually it gets a little deeper but that level is not involved in this question.) I want to flatten all this into one long list, throwing away all the dictionary keys.

Thus I want to transform

{1: {'a': [1, 2, 3], 'b': [0]},
 2: {'c': [4, 5, 1], 'd': [3, 8]}}

to

[1, 2, 3, 0, 4, 5, 1, 3, 8]

I could probably set up a map-reduce to iterate over items of the outer dictionary to build a sublist from each subdictionary and then concatenate all the sublists together.

But that seems inefficient for large data sets, because of the intermediate data structures (sublists) that will get thrown away. Is there a way to do it in one pass?

Barring that, I would be happy to accept a two-level implementation that works... my map-reduce is rusty!

Update: For those who are interested, below is the code I ended up using.

Note that although I asked above for a list as output, what I really needed was a sorted list; i.e. the output of the flattening could be any iterable that can be sorted.

def genSessions(d):
    """Given the ipDict, return an iterator that provides all the sessions,
    one by one, converted to tuples."""
    for uaDict in d.itervalues():
        for sessions in uaDict.itervalues():
            for session in sessions:
                yield tuple(session)

...

# Flatten dict of dicts of lists of sessions into a list of sessions.
# Sort that list by start time
sessionsByStartTime = sorted(genSessions(ipDict), key=operator.itemgetter(0))
# Then make another copy sorted by end time.
sessionsByEndTime = sorted(sessionsByStartTime, key=operator.itemgetter(1))

Thanks again to all who helped.

[Update: replaced nthGetter() with operator.itemgetter(), thanks to @intuited.]

+7  A: 

I hope you realize that any order you see in a dict is accidental -- it's there only because, when shown on screen, some order has to be picked, but there's absolutely no guarantee.

Net of ordering issues among the various sublists getting catenated,

[x for d in thedict.itervalues()
   for alist in d.itervalues()
   for x in alist]

does what you want without any inefficiency nor intermediate lists.

Alex Martelli
@Alex, thanks. This is exactly what I was looking for... short and efficient. I'm planning on sorting the flattened list afterwards. My sample output was not intended to imply a defined order, but it was good of you to check.
LarsH
+1  A: 

A recursive function may work:

def flat(d, out=[]):
 for val in d.values():
  if isinstance(val, dict):
    flat(d, out)
  else:
    out+= val

If you try it with :

>>> d = {1: {'a': [1, 2, 3], 'b': [0]}, 2: {'c': [4, 5, 6], 'd': [3, 8]}}
>>> out = []
>>> flat(d, out)
>>> print out
[1, 2, 3, 0, 4, 5, 6, 3, 8]

Notice that dictionaries have no order, so the list is in random order.

You can also return out (at the end of the loop) and don't call the function with a list argument.

def flat(d, out=[]):
 for val in d.values():
  if isinstance(val, dict):
    flat(d, out)
  else:
    out+= val
 return out

call as:

my_list = flat(d)
Arrieta
@Arrieta, thanks for this answer. I learned some good things from it! Like using += (aka .append()) to concatenate an accumulating list in-place and so not build temporary intermediate lists.
LarsH
+2  A: 

edit: re-read the original question and reworked answer to assume that all non-dictionaries are lists to be flattened.

In cases where you're not sure how far down the dictionaries go, you would want to use a recursive function. @Arrieta has already posted a function that recursively builds a list of non-dictionary values.

This one is a generator that yields successive non-dictionary values in the dictionary tree:

def flatten(d):
    """Recursively flatten dictionary values in `d`.

    >>> hat = {'cat': ['images/cat-in-the-hat.png'],
    ...        'fish': {'colours': {'red': [0xFF0000], 'blue': [0x0000FF]},
    ...                 'numbers': {'one': [1], 'two': [2]}},
    ...        'food': {'eggs': {'green': [0x00FF00]},
    ...                 'ham': ['lean', 'medium', 'fat']}}
    >>> set_of_values = set(flatten(hat))
    >>> sorted(set_of_values)
    [1, 2, 255, 65280, 16711680, 'fat', 'images/cat-in-the-hat.png', 'lean', 'medium']
    """
    try:
        for v in d.itervalues():
            for nested_v in flatten(v):
                yield nested_v
    except AttributeError:
        for list_v in d:
            yield list_v

The doctest passes the resulting iterator to the set function. This is likely to be what you want, since, as Mr. Martelli points out, there's no intrinsic order to the values of a dictionary, and therefore no reason to keep track of the order in which they were found.

You may want to keep track of the number of occurrences of each value; this information will be lost if you pass the iterator to set. If you want to track that, just pass the result of flatten(hat) to some other function instead of set. Under Python 2.7, that other function could be collections.Counter. For compatibility with less-evolved pythons, you can write your own function or (with some loss of efficiency) combine sorted with itertools.groupby.

intuited
@intuited: Thank you for the informative answer. The data structure I'm working from is regular, but I am glad to learn about a recursive approach for the sake of generality. For set(), it appears that the elements must be hashable, which is not true in my case: the elements are themselves lists, as I alluded to in the original question. However I might be converting them to tuples since they won't need to be mutable anymore.
LarsH
@intuited: although I already accepted Alex's excellent answer, I going to change to accept this one. Hope that's not a faux pas. Why: 1) This provides an iterator instead of a list, which may be more efficient to pass to sorted() than an explicit intermediate list that will be thrown away (though admittedly I asked for a list as output). 2) You put a lot of time and explanation into yours; I learned more. 3) Points matter more to you than to him. :-)
LarsH
@LarsH: Ah, you mean the "it gets a little deeper" comment. Yes, if you convert the lists-within-lists-within-dicts-within-dict to tuples you can add them to a set. Something like `set(tuple(list_) for list_ in flatten(hat))` would be the easiest way. Though the fact that you want to do this indicates that there might be a reason either to keep them as lists, or to use tuples from the beginning. Then again, if you do mean to create a "snapshot" of this particular point in their mutable existence, then it would make sense to do that.
intuited
@intuited: I'm not actually going to use a set... I will occasionally have distinct elements that are equal in value, and I want them distinct, but there are not enough of them (nor is their equality significant) to make it worthwhile to keep a count of each element. The elements needed to be mutable while I was building them, but they don't need to be anymore past this point.
LarsH
@intuited: In case you're interested, I posted the code I ended up using. Thanks again for your help.
LarsH