tags:

views:

144

answers:

3

my question is, is I have a list like the following:

someList = [[0,1,2],[3,4,5],[6,7,8]]

how would I get the first entry of each sublist?

I know I could do this:

newList = []
for entry in someList:
    newList.append(entry[0])

where newList would be:

[0, 3, 6]

But is there a way to do something like:

newList = someList[:][0] 

?

EDIT:

Efficiency is of great concern. I am actually going through a list that has over 300000 entries

+8  A: 
zip(*someList)[0]

EDIT:

In response to recursive's comment: One might also use

from itertools import izip
izip(*someList).next()

for better performance.

Some timing analysis:

python -m timeit "someList = [range(1000000), range(1000000), range(1000000)]; newlist = zip(*someList)[0]"
10 loops, best of 3: 498 msec per loop
python -m timeit "someList = [range(1000000), range(1000000), range(1000000)]; from itertools import izip; newlist = izip(*someList).next()"
10 loops, best of 3: 111 msec per loop
python -m timeit "someList = [range(1000000), range(1000000), range(1000000)]; newlist = [li[0] for li in someList]"
10 loops, best of 3: 110 msec per loop

So izip and the list comprehension play in the same league.

Of course the list comprehension is more flexible when you need an index other than 0, and is more explicit.

EDIT2:

Even the numpy solution is not as fast (but I might have chosen a non-representative example):

python -m timeit "import numpy as np; someList = np.array([range(1000000), range(1000000), range(1000000)]); newList = someList[:,0]"
10 loops, best of 3: 551 msec per loop
jellybean
If someList is large, this is going to do a lot of unnecessary work, by combining all the other columns too.
recursive
@recursive: Yeah, but it's neat, all the same. ;)
jellybean
Ok, that izip thing is clever. Props.
recursive
+8  A: 

Perfect case for a list comprehension:

[sublist[0] for sublist in someList]

Since efficiency is a major concern, this will be much faster than the zip approach. Depending what you're doing with the result, you may be able to get even more efficiency by using the generator expression approach:

(sublist[0] for sublist in someList)

Note that this returns a generator instead of a list though, so can't be indexed into.

recursive
+10  A: 

EDIT: Here's some actual numbers! The izip, list comprehension, and numpy ways of doing this are all about the same speed.

# zip
>>> timeit.timeit( "newlist = zip(*someList)[0]", setup = "someList = [range(1000000), range(1000000), range(1000000)]", number = 10 )
1.4984046398561759

# izip
>>> timeit.timeit( "newlist = izip(*someList).next()", setup = "someList = range(1000000), range(1000000), range(1000000)]; from itertools import izip", number = 10 )
2.2186223645803693e-05

# list comprehension
>>> timeit.timeit( "newlist = [li[0] for li in someList]", setup = "someList = [range(1000000), range(1000000), range(1000000)]", number = 10 )
1.4677040212518477e-05

# numpy
>>> timeit.timeit( "newlist = someList[0,:]", setup = "import numpy as np; someList = np.array([range(1000000), range(1000000), range(1000000)])", number = 10 )
6.6217344397045963e-05
>>>

For large data structures like this you should use numpy, which implementes an array type in C and hence is significantly more efficient. It also provides all the matrix manipulation you will ever want.

>>> import numpy as np
>>> foo = np.array([[0,1,2],[3,4,5],[6,7,8]])
>>> foo[:,0]
array([0, 3, 6])

You can also transpose...

>>> foo.transpose()
array([[0, 3, 6],
       [1, 4, 7],
       [2, 5, 8]])

...work with n-dimensional arrays...

>>> foo = np.zeros((3,3,3))
>>> foo
array([[[ 0.,  0.,  0.],
        [ 0.,  0.,  0.],
        [ 0.,  0.,  0.]],

       [[ 0.,  0.,  0.],
        [ 0.,  0.,  0.],
        [ 0.,  0.,  0.]],

       [[ 0.,  0.,  0.],
        [ 0.,  0.,  0.],
        [ 0.,  0.,  0.]]])
>>> foo[0,...]
array([[ 0.,  0.,  0.],
       [ 0.,  0.,  0.],
       [ 0.,  0.,  0.]])

..do efficient linear algebra...

>>> foo = no.ones((3,3))
>>> np.linalg.qr(foo)
(array([[-0.57735027,  0.81649658,  0.        ],
       [-0.57735027, -0.40824829, -0.70710678],
       [-0.57735027, -0.40824829,  0.70710678]]), array([[ -1.73205081e+00,  -1.
73205081e+00,  -1.73205081e+00],
       [  0.00000000e+00,  -1.57009246e-16,  -1.57009246e-16],
       [  0.00000000e+00,   0.00000000e+00,   0.00000000e+00]]))

...and basically do anything that Matlab can.

katrielalex
WOW, thanks so much. This is SUPER FAST compared to other solutions. If I could +100000 I would
Richard
@Richard: That depends ... the larger the list is, the better the izip and list comprehension solutions perform against numpy, because they don't consider the vast majority of the matrix entries.
jellybean
@jellybean: I doubt that... numpy is really quite well optimised. I think it's safe to say that a simple column-wise slice won't read the entire matrix. (Though I may of course be wrong, I don't know the C implementation details.) In fact, I think that `numpy` should perform *better* with a larger list, as the overhead of Python's native `list` type starts to add up. But of course, there's only one way to find out!
katrielalex
@katrielalex: Basically, I think you're right. I just tried an example (see my answer) that was extremely fortunate for the izip solution.
jellybean
@jellybean, sorry I missed the izip solution how would one use that to just grab the first element of each list?
Richard
@Richard: The `next()` call already does this ... as I indicated, it's less understandable than the numpy and list comprehension solutions. For the large list you have, I'd probably go with numpy.
jellybean
@jellybean: You're including all the setup (importing modules and defining the list) in the timer, which is giving bad results. In particular, importing `numpy` takes a second or two (it's __big__!) which you've included in the timings. See above.
katrielalex
@katrielalex: Sure, but isn't that fair? If it's really only about that one operation that gives you the first indexes, you'll have to include the import in the timing analysis. But I don't really want to question numpy's superiority here ... the OP probably wants to do some more operations on the list, so the imports pays off in the long run.
jellybean
@jellybean: yeah, I guess. Both ways work =).
katrielalex
Thanks to you both. Both solutions work well for what I need them for. My script +1 to you both and all your comments. It did speed up my script quite a bit.
Richard