tags:

views:

7717

answers:

9

Possible Duplicates:
Flattening a shallow list in Python
Comprehension for flattening a sequence of sequences?

I wonder whether there is a shortcut to make a simple list out of list of lists in Python.

I can do that in a for loop, but maybe there is some cool "one-liner"? I tried it with reduce, but I get an error.

Code

l = [[1,2,3],[4,5,6], [7], [8,9]]
reduce(lambda x,y: x.extend(y),l)

Error message

Traceback (most recent call last): File "", line 1, in File "", line 1, in AttributeError: 'NoneType' object has no attribute 'extend'

+3  A: 
>>> l = [[1,2,3],[4,5,6], [7], [8,9]]
>>> reduce(lambda x,y: x+y,l)
[1, 2, 3, 4, 5, 6, 7, 8, 9]

The extend() method in your example modifies x instead of returning a useful value (which reduce() expects).

Greg Hewgill
posted the same but you were faster ! :P
Andrea Ambu
+13  A: 
>>> sum(l, [])
[1, 2, 3, 4, 5, 6, 7, 8, 9]

Note that only works on lists of lists. For lists of lists of lists, you'll need another solution.

Triptych
What a cleverly implicit use of the overloaded (+) operator!
Nick Retallack
Thanks! This solution looks cool, and it's even shorter than with *reduce*.
Emma
See Nadia's reply below, too. This appears to be the fastest solution.
Emma
Errm, or maybe not...
Emma
that's pretty neat and clever but I wouldn't use it because it's confusing to read.
superjoe30
+37  A: 

[item for sublist in l for item in sublist] is faster than the shortcuts posted so far.

For evidence, as always, you can use the timeit module in the standard library:

$ python -mtimeit -s'l=[[1,2,3],[4,5,6], [7], [8,9]]*99' '[item for sublist in l for item in sublist]'
10000 loops, best of 3: 143 usec per loop
$ python -mtimeit -s'l=[[1,2,3],[4,5,6], [7], [8,9]]*99' 'sum(l, [])'
1000 loops, best of 3: 969 usec per loop
$ python -mtimeit -s'l=[[1,2,3],[4,5,6], [7], [8,9]]*99' 'reduce(lambda x,y: x+y,l)'
1000 loops, best of 3: 1.1 msec per loop

Explanation: the shortcuts based on + (including the implied use in sum) are, of necessity, O(L**2) when there are L sublists -- as the intermediate result list keeps getting longer, at each step a new intermediate result list object gets allocated, and all the items in the previous intermediate result must be copied over (as well as a few new ones added at the end). So (for simplicity and without actual loss of generality) say you have L sublists of I items each: the first I items are copied back and forth L-1 times, the second I items L-2 times, and so on; total number of copies is I times the sum of x for x from 1 to L excluded, i.e., I * (L**2)/2.

The list comprehension just generates one list, once, and copies each item over (from its original place of residence to the result list) also exactly once.

Alex Martelli
Evidence? Explanation?
Triptych
Not sure if its the fastest but its the most readable solution to me since it doesn't rely on anything but list comprehensions.
Kai
Editing the answer to show the evidence in a nicely formatted way and add the explanation.
Alex Martelli
+1 this is the winner :)
Nadia Alramli
I tried a test with the same data, using `itertools.chain.from_iterable` :         `$ python -mtimeit -s'from itertools import chain; l=[[1,2,3],[4,5,6], [7], [8,9]]*99' 'list(chain.from_iterable(l))'`.   It runs a bit more than twice as fast as the nested list comprehension that's the fastest of the alternatives shown here.
intuited
+1  A: 

Why do you use extend?

reduce(lambda x, y: x+y, l)

This should work fine
edit: greg was faster:P

Andrea Ambu
+3  A: 

There's an in-depth discussion of this here: http://rightfootin.blogspot.com/2006/09/more-on-python-flatten.html, discussing several methods of flattening arbitrarily nested lists of lists.

An interesting read!

RichieHindle
+1  A: 

Using timeit: number=1000000

>>> timeit.Timer(
        '[item for sublist in l for item in sublist]',
        'l=[[1, 2, 3], [4, 5, 6, 7, 8], [1, 2, 3, 4, 5, 6, 7]]'
    ).timeit()
2.4598159790039062
>>> timeit.Timer(
        'reduce(lambda x,y: x+y,l)',
        'l=[[1, 2, 3], [4, 5, 6, 7, 8], [1, 2, 3, 4, 5, 6, 7]]'
    ).timeit()
1.5289170742034912
>>> timeit.Timer(
        'sum(l, [])',
        'l=[[1, 2, 3], [4, 5, 6, 7, 8], [1, 2, 3, 4, 5, 6, 7]]'
    ).timeit()
1.0598428249359131

And for longer lists:

>>> timeit.Timer(
        '[item for sublist in l for item in sublist]',
        'l=[[1, 2, 3], [4, 5, 6, 7, 8], [1, 2, 3, 4, 5, 6, 7]] * 10'
    ).timeit()
20.126545906066895
>>> timeit.Timer(
        'reduce(lambda x,y: x+y,l)',
        'l=[[1, 2, 3], [4, 5, 6, 7, 8], [1, 2, 3, 4, 5, 6, 7]] * 10'
    ).timeit()
22.242258071899414
>>> timeit.Timer(
        'sum(l, [])',
        'l=[[1, 2, 3], [4, 5, 6, 7, 8], [1, 2, 3, 4, 5, 6, 7]] * 10'
    ).timeit()
16.449732065200806

I take my statement back. sum is not the winner. Although it is faster when the list is small. But the performance degrade significantly with larger lists.

>>> timeit.Timer(
        '[item for sublist in l for item in sublist]',
        'l=[[1, 2, 3], [4, 5, 6, 7, 8], [1, 2, 3, 4, 5, 6, 7]] * 10000'
    ).timeit(100)
2.0440959930419922

The sum version is still running for more than a minute and is not back yet!

Nadia Alramli
for a truly miniscule list, e.g. one with 3 sublists, maybe -- but since sum's performance goes with O(N**2) while the list comprehension's goes with O(N), just growing the input list a little will reverse things -- indeed the LC will be "infinitely faster" than sum at the limit as N grows. I was responsible for designing sum and doing its first implementation in the Python runtime, and I still wish I had found a way to effectively restrict it to summing numbers (what it's really good at) and block the "attractive nuisance" it offers to people who want to "sum" lists;-).
Alex Martelli
Alex, I ran the test again with much larger lists and you are right. Thanks for the correction!
Nadia Alramli
+1  A: 

The reason your function didn't work: the extend extends array in-place and doesn't return it. You can still return x from lambda, using some trick:

reduce(lambda x,y: x.extend(y) or x, l)

Note: extend is more efficient than + on lists.

Igor Krivokon
Thanks a lot! I just couldn't get why my example didn't work...
Emma
+3  A: 

@Nadia: You have to use much longer lists. Then you see the difference quite strikingly! My results for len(l) = 1600

A took 14.323 ms
B took 13.437 ms
C took 1.135 ms

where:

A = reduce(lambda x,y: x+y,l)
B = sum(l, [])
C = [item for sublist in l for item in sublist]
jacob
+6  A: 

How about using itertools?

>>> import itertools
>>> list2d = [[1,2,3],[4,5,6], [7], [8,9]]
>>> merged = list(itertools.chain(*list2d))
lsc