ansaurus

Question

python multidimensional list.. how to grab one dimension?

Answer 1

+8 A:

zip(*someList)[0]

EDIT:

In response to recursive's comment: One might also use

from itertools import izip
izip(*someList).next()

for better performance.

Some timing analysis:

python -m timeit "someList = [range(1000000), range(1000000), range(1000000)]; newlist = zip(*someList)[0]"
10 loops, best of 3: 498 msec per loop
python -m timeit "someList = [range(1000000), range(1000000), range(1000000)]; from itertools import izip; newlist = izip(*someList).next()"
10 loops, best of 3: 111 msec per loop
python -m timeit "someList = [range(1000000), range(1000000), range(1000000)]; newlist = [li[0] for li in someList]"
10 loops, best of 3: 110 msec per loop

So izip and the list comprehension play in the same league.

Of course the list comprehension is more flexible when you need an index other than 0, and is more explicit.

EDIT2:

Even the numpy solution is not as fast (but I might have chosen a non-representative example):

python -m timeit "import numpy as np; someList = np.array([range(1000000), range(1000000), range(1000000)]); newList = someList[:,0]"
10 loops, best of 3: 551 msec per loop

jellybean 2010-09-09 13:14:46

If someList is large, this is going to do a lot of unnecessary work, by combining all the other columns too.

recursive 2010-09-09 13:17:56

@recursive: Yeah, but it's neat, all the same. ;)

jellybean 2010-09-09 13:22:41

Ok, that izip thing is clever. Props.

recursive 2010-09-09 13:46:03

Answer 2

+8 A:

Perfect case for a list comprehension:

[sublist[0] for sublist in someList]

Since efficiency is a major concern, this will be much faster than the zip approach. Depending what you're doing with the result, you may be able to get even more efficiency by using the generator expression approach:

(sublist[0] for sublist in someList)

Note that this returns a generator instead of a list though, so can't be indexed into.

recursive 2010-09-09 13:17:03

Answer 3

+10 A:

EDIT: Here's some actual numbers! The izip, list comprehension, and numpy ways of doing this are all about the same speed.

# zip
>>> timeit.timeit( "newlist = zip(*someList)[0]", setup = "someList = [range(1000000), range(1000000), range(1000000)]", number = 10 )
1.4984046398561759

# izip
>>> timeit.timeit( "newlist = izip(*someList).next()", setup = "someList = range(1000000), range(1000000), range(1000000)]; from itertools import izip", number = 10 )
2.2186223645803693e-05

# list comprehension
>>> timeit.timeit( "newlist = [li[0] for li in someList]", setup = "someList = [range(1000000), range(1000000), range(1000000)]", number = 10 )
1.4677040212518477e-05

# numpy
>>> timeit.timeit( "newlist = someList[0,:]", setup = "import numpy as np; someList = np.array([range(1000000), range(1000000), range(1000000)])", number = 10 )
6.6217344397045963e-05
>>>

For large data structures like this you should use numpy, which implementes an array type in C and hence is significantly more efficient. It also provides all the matrix manipulation you will ever want.

>>> import numpy as np
>>> foo = np.array([[0,1,2],[3,4,5],[6,7,8]])
>>> foo[:,0]
array([0, 3, 6])

You can also transpose...

>>> foo.transpose()
array([[0, 3, 6],
       [1, 4, 7],
       [2, 5, 8]])

...work with n-dimensional arrays...

>>> foo = np.zeros((3,3,3))
>>> foo
array([[[ 0.,  0.,  0.],
        [ 0.,  0.,  0.],
        [ 0.,  0.,  0.]],

       [[ 0.,  0.,  0.],
        [ 0.,  0.,  0.],
        [ 0.,  0.,  0.]],

       [[ 0.,  0.,  0.],
        [ 0.,  0.,  0.],
        [ 0.,  0.,  0.]]])
>>> foo[0,...]
array([[ 0.,  0.,  0.],
       [ 0.,  0.,  0.],
       [ 0.,  0.,  0.]])

..do efficient linear algebra...

>>> foo = no.ones((3,3))
>>> np.linalg.qr(foo)
(array([[-0.57735027,  0.81649658,  0.        ],
       [-0.57735027, -0.40824829, -0.70710678],
       [-0.57735027, -0.40824829,  0.70710678]]), array([[ -1.73205081e+00,  -1.
73205081e+00,  -1.73205081e+00],
       [  0.00000000e+00,  -1.57009246e-16,  -1.57009246e-16],
       [  0.00000000e+00,   0.00000000e+00,   0.00000000e+00]]))

...and basically do anything that Matlab can.

katrielalex 2010-09-09 13:29:38

WOW, thanks so much. This is SUPER FAST compared to other solutions. If I could +100000 I would

Richard 2010-09-09 13:44:29

@Richard: That depends ... the larger the list is, the better the izip and list comprehension solutions perform against numpy, because they don't consider the vast majority of the matrix entries.

jellybean 2010-09-09 13:57:36

@jellybean: I doubt that... numpy is really quite well optimised. I think it's safe to say that a simple column-wise slice won't read the entire matrix. (Though I may of course be wrong, I don't know the C implementation details.) In fact, I think that `numpy` should perform *better* with a larger list, as the overhead of Python's native `list` type starts to add up. But of course, there's only one way to find out!

katrielalex 2010-09-09 14:01:23

@katrielalex: Basically, I think you're right. I just tried an example (see my answer) that was extremely fortunate for the izip solution.

jellybean 2010-09-09 14:03:08

@jellybean, sorry I missed the izip solution how would one use that to just grab the first element of each list?

Richard 2010-09-09 14:09:05

@Richard: The `next()` call already does this ... as I indicated, it's less understandable than the numpy and list comprehension solutions. For the large list you have, I'd probably go with numpy.

jellybean 2010-09-09 14:13:57

@jellybean: You're including all the setup (importing modules and defining the list) in the timer, which is giving bad results. In particular, importing `numpy` takes a second or two (it's __big__!) which you've included in the timings. See above.

katrielalex 2010-09-09 14:14:00

@katrielalex: Sure, but isn't that fair? If it's really only about that one operation that gives you the first indexes, you'll have to include the import in the timing analysis. But I don't really want to question numpy's superiority here ... the OP probably wants to do some more operations on the list, so the imports pays off in the long run.

jellybean 2010-09-09 14:18:38

@jellybean: yeah, I guess. Both ways work =).

katrielalex 2010-09-09 14:19:28

Thanks to you both. Both solutions work well for what I need them for. My script +1 to you both and all your comments. It did speed up my script quite a bit.

Richard 2010-09-09 14:23:15

ansaurus

tags:

views:

answers:

python multidimensional list.. how to grab one dimension?

related questions