ansaurus

Question

Is there a functional way to do this?

Answer 1

+1 A:

This blog post covers a variety of flattening strategies. There's a little more discussion over here.

Callahad 2010-03-18 16:22:36

That second link is full of sorely-outdated, originally-awful code.

Mike Graham 2010-03-19 01:38:27

Answer 2

+7 A:

You should avoid typechecking in Python. In this case, this means avoiding arbitrarily-nested structures where you distinguish by type. You can build your own node type which you can traverse by methods other than typechecking, like looking at a specific attribute.
For flattening one level or exactly n levels, look at itertools.chain.from_iterable.
I don't know what you mean by "functional". This code is pretty functional: it uses recursion (not to its credit!) and it doesn't mutate its argument. (Strictly speaking, it does use mutable state for building a list, but that is just how you do it in Python.

I suppose one more functional attribute would be lazy evaluation. You could implement this thusly

def flatten(toFlatten):
    for item in toFlatten:
        if isinstance(item, list): # Ewww, typchecking
            for subitem in flatten(item): # they are considering adding 
                yield subitem             # "yield from" to the  language
                                          # to give this pattern syntax
        else:
            yield item

Recursion is very limited in Python (at least, in all its major implementations) and should generally be avoided for arbitrary depth. It is quite possible to rewrite this (and all recursive code) to use iteration, which will make this more scalable (and less functional, which is a good thing in Python, which is not especially suited for FP.)

Mike Graham 2010-03-18 16:55:49

@Mike Graham: I want to be able to pass in lists containing lists containing lists, containing lists, etc., and flatten them all the way down to one single result. I want: [1,2,[3,4,5,6], 7,8,[9,10,[11,12,[13,[14,15],16],17],20]]To come out as: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 20]If I knew in advance how deeply the lists nested, this would suffice: def mergeLists(seq): return reduce(lambda x,y: x+y, seq)

Ishpeck 2010-03-18 17:11:38

1) Stop wanting that. Define your structure differently. 2) Your `reduce` strategy is multiplicative in big-O order; the linear algorithms are obvious.

Mike Graham 2010-03-18 17:24:20

A more general answer: http://stackoverflow.com/questions/2158395/flatten-an-irregular-list-of-lists-in-python/2158532#2158532

J.F. Sebastian 2010-03-18 19:38:35

I like that answer less than this bad answer. Typechecking sucks, but if you're using type to indicate data, it should be a *very specific type*. For example, I should be free to make my *own* string-ish type that iterates of length-1 sequences of itself and not subclass basestring. If I *was* going to use type to indicate this information, not only would I limit it to exactly `list`, I'd probably subclass `list` so that I could typecheck for *exactly* what I want.

Mike Graham 2010-03-18 19:46:28

It's worth noting nested lists like [1,2,[3,4,5,6], 7,8,[9,10,[11,12,[13,[14,15],16],17],20]] are really trees: Node([Leaf(1),Leaf(2),Node([Leaf(3),Leaf(4)... etc.

sdcvvc 2010-03-18 20:05:28

Answer 3

+1 A:

Here's another option (though there may be something cleaner than type-checking, like testing if something is iterable and hence not an "atom"):

def flatten(lst):
    if not isinstance(lst,list):
        return [lst]
    else:
        return reduce(lambda x,y:x+y,[flatten(x) for x in lst],[])

It's based on something scheme-like.

mitmatt 2010-03-18 17:07:20

`try: return reduce(...); except TypeError: return [lst]`

Debilski 2010-03-18 17:14:56

That's perfect. Thanks.

Ishpeck 2010-03-18 17:24:15

@mitmatt, 1) The function `lambda x, y: x + y` is called `operator.add`. 2) Your algorithm is so unnecessarily O(n * m)ish, when the linear algorithm is more obvious.

Mike Graham 2010-03-18 17:26:53

@Debilski, Normally exception handling would be better form, but recursive flattening is tricky. Try thinking about what will happen if you have a string in there!

Mike Graham 2010-03-18 17:28:14

@Mike Graham: True. That wouldn’t be nice.

Debilski 2010-03-18 17:30:48

@Mike, I'm not sure what your m and n refer to, but the code I provided is linear in the number of atoms in the deep-list argument: each atom is listified exactly once, and the reduce will only call + once for each of its recursively-flattened args (meaning only one append per atom). Calling operator.add means another import statement, but it's certainly a valid alternative!

mitmatt 2010-03-18 17:45:11

@mitmatt, I posted an answer that hopefully explains the quadratic nature of your algorithm.

Mike Graham 2010-03-18 17:55:20

Easier yet: replace `reduce(lambda x,y:x+y,` with `sum(`.

Debilski 2010-03-18 17:55:24

@Mike, your post seems to have a mistake: concatenating linked lists doesn't require copying the elements, but rather just a single pointer operation. Each of those + operations is constant time.

mitmatt 2010-03-18 18:03:32

@Debliski, This is still quadratic, which is completely unavoidable. Also, `sum` defaults to starting to sum with 0, which isn't appropriate in this case; you'd have to pass a `start` param.

Mike Graham 2010-03-18 18:05:18

@mitmatt, Python lists **aren't** linked lists. They are mutable array-like objects that copy when you use `+`.

Mike Graham 2010-03-18 18:06:13

@Mike Graham: The start param would be there, if following the replacement rule I gave. But I did not mean that it would generally be faster, it was just another option to calling `operator.add`.

Debilski 2010-03-18 18:08:31

Though the `sum` is faster than `reduce` + `lambda`.

Debilski 2010-03-18 18:09:43

@Mike, that's actually implementation-dependent and not specified by the Python language, but I think you'll find that the CPython implementation provides (amortized) constant-time appends:http://wiki.python.org/moin/TimeComplexity

mitmatt 2010-03-18 18:13:31

@Deblinski, Using `reduce` or `sum` are both slow, because they are O(n^2) and the right algorithm to use is O(n)

Mike Graham 2010-03-18 18:13:33

My mistake: the appropriate function in that reference is probably "extend", which does indeed require more operations. From a data structures perspective, that's not entirely necessary (e.g. [1]), but it does seem to be the case with CPython.[1] http://en.wikipedia.org/wiki/Deque#Implementations

mitmatt 2010-03-18 18:19:30

@mittmatt, `collections.deque` is implemented using a doubly-linked list. `list`is implemented using a resizing array, and supports the appropriate operations (like O(1) indexing). In any event, even if list *was* a doubly linked list (which it isn't!), it isn't clear whether the right thing to do with concatenation is copying or not, if it is mutable.

Mike Graham 2010-03-18 18:34:41

@mitmatt, anything that doesn't have algorithmic complexities identical to `list` in CPython for its list and that is trying to claim to be Python should have its developers shot.

Mike Graham 2010-03-18 18:38:08

Answer 4

+2 A:

Under the doc for itertools, there's a flatten() function

ghostdog74 2010-03-18 17:11:14

Not on that page there isn't.

Andrew Aylett 2010-03-18 17:14:20

@Andrew Aylett, It is a recipe, not in the module itself. It's on that page.

Mike Graham 2010-03-18 17:55:56

@Mike: Admit it, you edited the documentation after I posted my comment. Not sure how I missed that -- it didn't come up when I searched the page at work :P.

Andrew Aylett 2010-03-18 19:01:47

Answer 5

+2 A:

This answer explains why you do not want to use reduce for this in Python.

Consider the snippet

reduce(operator.add, [[1], [2], [3], [4], [5]])

What does this have to do?

[1] + [2] => [1, 2]
[1, 2] + [3] => This makes a new list, having to go over 1, then 2, then 3. [1, 2, 3]
[1, 2, 3] + [4] => This has to copy the 1, 2, and 3 and then put 4 in the new list
[1, 2, 3, 4] + [5] => The length of stuff I have to copy gets bigger each time!

This quadratic behavior is completely avoidable: the original solution (and any number of other solutions) does not form these intermediate copying steps.

Mike Graham 2010-03-18 17:54:36

ansaurus

tags:

views:

answers:

Is there a functional way to do this?

related questions