views:

127

answers:

5

I have a dict data structure with various "depths". By "depths" I mean for example: When depth is 1, dict will be like:

{'str_key1':int_value1, 'str_key2:int_value2}

When depth is 2, dict will be like:

{'str_key1':
    {'str_key1_1':int_value1_1,
     'str_key1_2':int_value1_2},
 'str_key2':
    {'str_key2_1':int_value2_1, 
     'str_key2_2':int_value2_2} }

so on and so forth.

When I need to process the data, now I'm doing this:

def process(keys,value):
  #do sth with keys and value
  pass

def iterate(depth,dict_data):
  if depth == 1:
    for k,v in dict_data:
      process([k],v)
  if depth == 2:
    for k,v in dict_data:
      for kk,vv, in v:
        process([k,kk],v)
  if depth == 3:
    .........

So I need n for loops when depth is n. As depth can go up to 10, I'm wondering if there is a more dynamic way to do the iteration without having to write out all the if and for clauses.

Thanks.

+2  A: 

The obvious answer is to use recursion. But, you can do something slick with Python here to flatten the dictionary. This is still fundamentally recursive --- we are just implementing our own stack.

def flatten(di):
     stack = [di]
     while stack:
         e = stack[-1]
         for k, v in e.items():
             if isinstance(v, dict):
                 stack.append(v)
             else:
                 yield k, v
         stack.remove(e)

Then, you can do something like:

for k, v in flatten(mycomplexdict):
     process(k, v)
carl
+1. That's a neat trick
aaronasterling
I can't figure out where you are adding keys to the list of keys. It looks like all the keys but the last are being dropped.
deinst
@deinst, this is dropping all keys that are not leaf nodes. But, you can easily produce those by yielding after you append as well. It will take some modification to suit a particular application. (I use this pattern when recursing over a file system).
carl
-1 for `isinstance`. The question clearly states that depths aren't jagged; there's no need for typechecking.
Aaron Gallagher
@Aaron Gallegher presumably a non-dict value will be encountered somewhere, or is it dictionaries all the way down?
aaronasterling
@Aaron Gallagher, in my opinion type-checking is necessary here as it determines what is returned in the list. There may a value that acts like a dictionary, but really should be in the list. At its best, this is a semantic issue.
carl
@carl, you still aren't answering the question as asked. If you're going to go off on a tangent, at least provide useful code, like a class that has separate `children` and `value` attributes. Typechecking is baaasically always wrong.
Aaron Gallagher
This doesn't preserve the path, as the OP's solution does.
Nick Johnson
+2  A: 

Recursion is your friend:

def process(keys,value):
  #do sth with keys and value
  pass

def iterate(depth, dict_data):
  iterate_r(depth, dict_data, [])

def iterate_r(depth, data, keys):
  if depth == 0:
    process(keys, data)
  else:
    for k,v in dict_data.items():
      iterate_r(depth-1, v, keys+[k])
Ned Batchelder
A: 

Recusive, just keep in mind python can only recurse 1000 times:

def process(key, value):
    print key, value

def process_dict(dict, callback):
    for k, v in dict.items():
        if hasattr(v, 'items'):
            process_dict(v, callback)
        else:
            callback(k, v)

d = {'a': 1, 'b':{'b1':1, 'b2':2, 'b3':{'bb1':1}}}
process_dict(d, process)

Prints:

a 1
b1 1
b2 2
bb1 1
Matt Williamson
-1 for using isinstance, shadowing `dict`, *and* using `type`.
Aaron Gallagher
Since you're so helpful as to point out someone's errors, maybe you'd like to describe the correct way.
Matt Williamson
Would you say my edit is a little more pythonic?
Matt Williamson
@Matt Williamson, I think your previous version was better. What if I have an object that has a 'items' method, but it's not a dictionary?
carl
@carl, what if I have an object that behaves just like a dict but doesn't subclass `dict`?
Aaron Gallagher
This also doesn't preserve the path, unlike the OP's solution.
Nick Johnson
@carl, that's ok, because it quacks like a duck. the only method we are using on it is the items method, so it should be dictionary-like. more flexible that way.
Matt Williamson
+3  A: 

I'm not sure why everybody's thinking in terms of recursion (or recursion elimination) -- I'd just do depth steps, each of which rebuilds a list by expanding it one further level down.

E.g.:

def itr(depth, d):
  cp = [([], d)]
  for _ in range(depth):
    cp = [(lk+[k], v) for lk, d in cp for k, v in d.items()]
  for lk, ad in cp:
    process(lk, ad)

easy to "expand" with longer identifiers and lower code density if it need to be made more readable for instructional purposes, but I think the logic is simple enough that it may not need such treatment (and, verbosity for its own sake has its drawbacks, too;-).

For example:

d = {'str_key1':
      {'str_key1_1':'int_value1_1',
       'str_key1_2':'int_value1_2'},
     'str_key2':
      {'str_key2_1':'int_value2_1', 
       'str_key2_2':'int_value2_2'} }

def process(lok, v):
  print lok, v

itr(2, d)

prints

['str_key2', 'str_key2_2'] int_value2_2
['str_key2', 'str_key2_1'] int_value2_1
['str_key1', 'str_key1_1'] int_value1_1
['str_key1', 'str_key1_2'] int_value1_2

(if some specific order is desired, appropriate sorting can of course be performed on cp).

Alex Martelli
This is going to spend a lot of time constructing throwaway intermediate lists. Recursion seems like a better option, to me.
Nick Johnson
@Nick, tried measuring? Remember function calls have a cost. If profiling shows that the construction of intermediate lists this is easier to optimize than, e.g., @Ned's intermediate lists, which get passed as the third argument in the recursive call. If tuples are acceptable in lieu of lists (despite the OP's example showing lists as `process`'s first arg), they can be used just as well here as in your answer, of course.
Alex Martelli
I realise function calls have a cost, but I expect that manually unrolling the dicts several times will well outweigh that - not that I have figures to back that up. :)
Nick Johnson
+1  A: 

Assuming you want a fixed depth (most other answers seem to assume you want to recurse to max depth), and you need to preserve the path as in your original question, here's the most straightforward solution:

def process_dict(d, depth, callback, path=()):
  for k, v in d.iteritems():
    if depth == 1:
      callback(path + (k,), v)
    else:
      process_dict(v, depth - 1, callback, path + (k,))

Here's an example of it in action:

>>> a_dict = {
...     'dog': {
...         'red': 5,
...         'blue': 6,
...     },
...     'cat': {
...         'green': 7,
...     },
... }
>>> def my_callback(k, v):
...   print (k, v)
...
>>> process_dict(a_dict, 1, my_callback)
(('dog',), {'blue': 6, 'red': 5})
(('cat',), {'green': 7})
>>> process_dict(a_dict, 2, my_callback)
(('dog', 'blue'), 6)
(('dog', 'red'), 5)
(('cat', 'green'), 7)
Nick Johnson
Since tuples are immutable, no need to do `path=None` and the extra `if not path:` -- just use `path=()` and it will simplify your code at absolutely no cost.
Alex Martelli
Good point. I switched from lists to tuples halfway through. Fixed.
Nick Johnson