ansaurus

Question

Making a flat list out of list of lists in Python

Answer 1

+3 A:

>>> l = [[1,2,3],[4,5,6], [7], [8,9]]
>>> reduce(lambda x,y: x+y,l)
[1, 2, 3, 4, 5, 6, 7, 8, 9]

The extend() method in your example modifies x instead of returning a useful value (which reduce() expects).

Greg Hewgill 2009-06-04 20:35:30

posted the same but you were faster ! :P

Andrea Ambu 2009-06-04 20:39:47

Answer 2

+13 A:

>>> sum(l, [])
[1, 2, 3, 4, 5, 6, 7, 8, 9]

Note that only works on lists of lists. For lists of lists of lists, you'll need another solution.

Triptych 2009-06-04 20:35:53

What a cleverly implicit use of the overloaded (+) operator!

Nick Retallack 2009-06-04 20:39:39

Thanks! This solution looks cool, and it's even shorter than with *reduce*.

Emma 2009-06-04 20:44:04

See Nadia's reply below, too. This appears to be the fastest solution.

Emma 2009-06-04 20:59:36

Errm, or maybe not...

Emma 2009-06-04 21:11:44

that's pretty neat and clever but I wouldn't use it because it's confusing to read.

superjoe30 2010-06-15 18:55:14

Answer 3

+37 A:

[item for sublist in l for item in sublist] is faster than the shortcuts posted so far.

For evidence, as always, you can use the timeit module in the standard library:

$ python -mtimeit -s'l=[[1,2,3],[4,5,6], [7], [8,9]]*99' '[item for sublist in l for item in sublist]'
10000 loops, best of 3: 143 usec per loop
$ python -mtimeit -s'l=[[1,2,3],[4,5,6], [7], [8,9]]*99' 'sum(l, [])'
1000 loops, best of 3: 969 usec per loop
$ python -mtimeit -s'l=[[1,2,3],[4,5,6], [7], [8,9]]*99' 'reduce(lambda x,y: x+y,l)'
1000 loops, best of 3: 1.1 msec per loop

Explanation: the shortcuts based on + (including the implied use in sum) are, of necessity, O(L**2) when there are L sublists -- as the intermediate result list keeps getting longer, at each step a new intermediate result list object gets allocated, and all the items in the previous intermediate result must be copied over (as well as a few new ones added at the end). So (for simplicity and without actual loss of generality) say you have L sublists of I items each: the first I items are copied back and forth L-1 times, the second I items L-2 times, and so on; total number of copies is I times the sum of x for x from 1 to L excluded, i.e., I * (L**2)/2.

The list comprehension just generates one list, once, and copies each item over (from its original place of residence to the result list) also exactly once.

Alex Martelli 2009-06-04 20:37:01

Evidence? Explanation?

Triptych 2009-06-04 20:37:47

Not sure if its the fastest but its the most readable solution to me since it doesn't rely on anything but list comprehensions.

Kai 2009-06-04 20:41:51

Editing the answer to show the evidence in a nicely formatted way and add the explanation.

Alex Martelli 2009-06-04 20:51:01

+1 this is the winner :)

Nadia Alramli 2009-06-04 21:20:58

I tried a test with the same data, using `itertools.chain.from_iterable` : `$ python -mtimeit -s'from itertools import chain; l=[[1,2,3],[4,5,6], [7], [8,9]]*99' 'list(chain.from_iterable(l))'`. It runs a bit more than twice as fast as the nested list comprehension that's the fastest of the alternatives shown here.

intuited 2010-10-15 01:21:33

Answer 4

+1 A:

Why do you use extend?

reduce(lambda x, y: x+y, l)

This should work fine
edit: greg was faster:P

Andrea Ambu 2009-06-04 20:38:08

Answer 5

+3 A:

There's an in-depth discussion of this here: http://rightfootin.blogspot.com/2006/09/more-on-python-flatten.html, discussing several methods of flattening arbitrarily nested lists of lists.

An interesting read!

RichieHindle 2009-06-04 20:41:13

Answer 6

+1 A:

Using timeit: number=1000000

>>> timeit.Timer(
        '[item for sublist in l for item in sublist]',
        'l=[[1, 2, 3], [4, 5, 6, 7, 8], [1, 2, 3, 4, 5, 6, 7]]'
    ).timeit()
2.4598159790039062
>>> timeit.Timer(
        'reduce(lambda x,y: x+y,l)',
        'l=[[1, 2, 3], [4, 5, 6, 7, 8], [1, 2, 3, 4, 5, 6, 7]]'
    ).timeit()
1.5289170742034912
>>> timeit.Timer(
        'sum(l, [])',
        'l=[[1, 2, 3], [4, 5, 6, 7, 8], [1, 2, 3, 4, 5, 6, 7]]'
    ).timeit()
1.0598428249359131

And for longer lists:

>>> timeit.Timer(
        '[item for sublist in l for item in sublist]',
        'l=[[1, 2, 3], [4, 5, 6, 7, 8], [1, 2, 3, 4, 5, 6, 7]] * 10'
    ).timeit()
20.126545906066895
>>> timeit.Timer(
        'reduce(lambda x,y: x+y,l)',
        'l=[[1, 2, 3], [4, 5, 6, 7, 8], [1, 2, 3, 4, 5, 6, 7]] * 10'
    ).timeit()
22.242258071899414
>>> timeit.Timer(
        'sum(l, [])',
        'l=[[1, 2, 3], [4, 5, 6, 7, 8], [1, 2, 3, 4, 5, 6, 7]] * 10'
    ).timeit()
16.449732065200806

I take my statement back. sum is not the winner. Although it is faster when the list is small. But the performance degrade significantly with larger lists.

>>> timeit.Timer(
        '[item for sublist in l for item in sublist]',
        'l=[[1, 2, 3], [4, 5, 6, 7, 8], [1, 2, 3, 4, 5, 6, 7]] * 10000'
    ).timeit(100)
2.0440959930419922

The sum version is still running for more than a minute and is not back yet!

Nadia Alramli 2009-06-04 20:46:49

for a truly miniscule list, e.g. one with 3 sublists, maybe -- but since sum's performance goes with O(N**2) while the list comprehension's goes with O(N), just growing the input list a little will reverse things -- indeed the LC will be "infinitely faster" than sum at the limit as N grows. I was responsible for designing sum and doing its first implementation in the Python runtime, and I still wish I had found a way to effectively restrict it to summing numbers (what it's really good at) and block the "attractive nuisance" it offers to people who want to "sum" lists;-).

Alex Martelli 2009-06-04 21:07:24

Alex, I ran the test again with much larger lists and you are right. Thanks for the correction!

Nadia Alramli 2009-06-04 21:19:22

Answer 7

+1 A:

The reason your function didn't work: the extend extends array in-place and doesn't return it. You can still return x from lambda, using some trick:

reduce(lambda x,y: x.extend(y) or x, l)

Note: extend is more efficient than + on lists.

Igor Krivokon 2009-06-04 20:47:13

Thanks a lot! I just couldn't get why my example didn't work...

Emma 2009-06-04 20:50:39

Answer 8

+3 A:

@Nadia: You have to use much longer lists. Then you see the difference quite strikingly! My results for len(l) = 1600

A took 14.323 ms
B took 13.437 ms
C took 1.135 ms

where:

A = reduce(lambda x,y: x+y,l)
B = sum(l, [])
C = [item for sublist in l for item in sublist]

jacob 2009-06-04 20:57:24

Answer 9

+6 A:

How about using itertools?

>>> import itertools
>>> list2d = [[1,2,3],[4,5,6], [7], [8,9]]
>>> merged = list(itertools.chain(*list2d))

lsc 2009-06-04 21:06:17

ansaurus

tags:

views:

answers:

Making a flat list out of list of lists in Python

related questions