ansaurus

Question

Split a list into parts based on a set of indexes in Python

Answer 1

+2 A:

I would be interested in seeing a more Pythonic way of doing this also. But this is a crappy solution. You would need to add checking for an empry index list.

Something along the lines of:

indexes = [5, 12, 17]
list = range(20)

output = []
prev = 0

for index in indexes:
    output.append(list[prev:index])
    prev = index

output.append(list[indexes[-1]:])

print output

produces

[[0, 1, 2, 3, 4], [5, 6, 7, 8, 9, 10, 11], [12, 13, 14, 15, 16], [17, 18, 19]]

kjfletch 2009-07-29 07:28:04

Not very happy with 2 down votes for throwing ideas in the pot. Especially with no comments. I stated the rough nature of the solution.

kjfletch 2009-07-29 08:13:24

This site shouldn't allow downvoting without an explanation. It's not useful at all.

Glenn Maynard 2009-07-29 08:20:52

This answer is perfect, and what I or anyone would write if faced with this problem in the real world. The question was asking specifically for a "Pythonic" way, which I think some of the other answers adress better than this one.

anthony 2009-07-29 08:28:47

That's why I hate the buzzword "Pythonic". It's as if everything written in Python should be written in some special Python-specific way, preferably forcibly crushed onto a single line for some reason.

Glenn Maynard 2009-07-29 09:25:06

In my opinion, "pythonic" just means good, idiomatic style. It does not mean hypercondensed one-line solutions that show off every python feature. This looks perfectly pythonic to me. It uses slices appropriately, uses range when range is more appropriate than xrange, and iterates directly over the list rather than looping through the indices. Pythonic? Check. Comprehensible? Check. Accurate? Check. +1

jcdyer 2009-07-29 20:54:13

Oh, and in python 2, you can use prev after the for loop exits, so you can replace `output.append(list[indexes[-1]:])` with `output.append(list[prev:])`.

jcdyer 2009-07-29 20:56:55

Answer 2

A:

This is all that I could think of

def partition(list_, indexes):
    if indexes[0] != 0:
     indexes = [0] + indexes
    if indexes[-1] != len(list_):
     indexes = indexes + [len(list_)]
    return [ list_[a:b] for (a,b) in zip(indexes[:-1], indexes[1:])]

Il-Bhima 2009-07-29 07:32:07

Clever solution, very concise =) I'll give my +1 to kjfletch's answer though, because it reuses existing values, while this method does a lot of list creation/modification.

Blixt 2009-07-29 07:40:17

It'd be more consistent to drop the conditionals--if the first index is 0, the first item should be empty. Just use `indexes = [0] + indexes + [None]`.

Glenn Maynard 2009-07-29 07:41:07

I like this solution. Looks clever.

kjfletch 2009-07-29 07:43:06

Also, it's probably better off with itertools.izip instead of zip, and itertools.islice instead of direct slicing.

Glenn Maynard 2009-07-29 07:44:37

@Glenn Hmm, I was actually trying to avoid having those empty lists at the start and end. Not sure if that was want the original poster wanted

Il-Bhima 2009-07-29 07:48:19

don't overwrite a built-in variable (list)

dalloliogm 2009-07-29 08:09:12

Answer 3

+5 A:

My solution is similar to Il-Bhima's.

>>> def parts(list_, indices):
...     indices = [0]+indices+[len(list_)]
...     return [list_[v:indices[k+1]] for k, v in enumerate(indices[:-1])]

Alternative approach

If you're willing to slightly change the way you input indices, from absolute indices to relative (that is, from [5, 12, 17] to [5, 7, 5], the below will also give you the desired output, while it doesn't create intermediary lists.

>>> from itertools import islice
>>> def parts(list_, indices):
...     i = iter(list_)
...     return [list(islice(i, n)) for n in chain(indices, [None])]

Cide 2009-07-29 07:35:18

`indicies` -> `indices`

Blixt 2009-07-29 07:37:37

+1 for shortness and for reusing values (`enumerate` iterates the existing array, as opposed to `zip`)

Blixt 2009-07-29 07:43:43

As Blixt pointed out I think you meant indices instead of indicies. Then you have the minor issue when passing indices like [0,5,12,17], in which case your result will contain the empty list list_[0:0]

Il-Bhima 2009-07-29 07:44:50

@Il-Bhima: Which could be argued is correct, since the first part is then of length 0, which is consistent with the OP's example.

Cide 2009-07-29 07:47:39

@Il-Bhima, I think the intended behavior of passing in "split at index 0" would be to get an empty array as the first split value. Personally, I hate "magic" behavior that changes based on the parameters.

Blixt 2009-07-29 07:48:01

@Cide, Blixt. Fair enough.

Il-Bhima 2009-07-29 07:49:49

I updated this version to not create new lists (other than the slices necessary) while improving simplicity.

Cide 2009-07-29 08:30:53

This got downvoted why? Please leave comments if you think this is wrong or otherwise inappropriate, thanks :). I tested it pretty thoroughly so I think it covers all the edge cases as well.

Cide 2009-07-29 09:12:09

This is unintuitive code, and it doesn't work (compare the output to the other answers). Also, don't replace your answer with a completely different one after people have already voted and commented on the original.

Glenn Maynard 2009-07-29 09:15:08

You might want to give people a chance to *write* a comment before you complain about not receiving one.

Glenn Maynard 2009-07-29 09:15:53

@Glenn Maynard: Thank you; you're perfectly right. I'll keep that in mind from now on.

Cide 2009-07-29 09:48:48

@Cide: Why did you decide against this if you don't mind me asking? http://stackoverflow.com/revisions/0a518729-d726-47b5-b62b-c674e37849ef/view-source

drjeep 2009-07-29 10:47:42

It's there now, but its functionality is slightly different from whta the OP asked for. It's detailed in the "Alternative approach" in the post now.

Cide 2009-07-29 20:18:12

Answer 4

A:

The plural of index is indices. Going for simplicity/readability.

indices = [5, 12, 17]
input = range(20)
output = []

for i in reversed(indices):
    output.append(input[i:])
    input[i:] = []
output.append(input)

while len(output):
    print output.pop()

anthony 2009-07-29 07:43:52

"Indexes" and "indices" are both correct. "Indexes" is a pluralization of "index" and is more common in the US, while "indices" is derived from Latin and is more common in the UK.

Blixt 2009-07-29 07:45:23

No, "indexes" is also a perfectly correct word.

Glenn Maynard 2009-07-29 07:46:53

Did anyone want to discuss the viability of working right to left? or is this GrammarOverflow now?

anthony 2009-07-29 07:52:56

Working right to left would imply you either insert at other locations than the end or reverse the list before returning it. Either case is less than ideal.

Cide 2009-07-29 07:58:20

-1 It mutates the input ... aarrgghh!!

John Machin 2009-07-29 13:16:28

Answer 5

A:

Cide's makes three copies of the array: [0]+indices copies, ([0]+indices)+[] copies again, and indices[:-1] will copy a third time. Il-Bhima makes five copies. (I'm not counting the return value, of course.)

Those could be reduced (izip, islice), but here's a zero-copy version:

def iterate_pairs(lst, indexes):
    prev = 0
    for i in indexes:
        yield prev, i
        prev = i
    yield prev, len(lst)

def partition(lst, indexes):
    for first, last in iterate_pairs(lst, indexes):
        yield lst[first:last]

indexes = [5, 12, 17]
lst = range(20)

print [l for l in partition(lst, indexes)]

Of course, array copies are fairly cheap (native code) compared to interpreted Python, but this has another advantage: it's easy to reuse, to mutate the data directly:

for first, last in iterate_pairs(lst, indexes):
    for i in range(first, last):
        lst[i] = first
print lst
# [0, 0, 0, 0, 0, 5, 5, 5, 5, 5, 5, 5, 12, 12, 12, 12, 12, 17, 17, 17]

(That's why I passed indexes to iterate_pairs. If you don't care about that, you can remove that parameter and just have the final line be "yield prev, None", which is all partition() needs.)

Glenn Maynard 2009-07-29 08:20:00

Answer 6

A:

indices = [5, 12, 17]
input = range(20)
output = []

reduce(lambda x, y: output.append(input[x:y]) or y, indices + [len(input)], 0)
print output

anthony 2009-07-29 08:24:49

Answer 7

+6 A:

This is the simplest and most pythonic solution I can think of:

def partition(alist, indices):
    return [alist[i:j] for i, j in zip([0]+indices, indices+[None])]

if the inputs are very large, then the iterators solution should be more convenient:

from itertools import izip, chain
def partition(alist, indices):
    pairs = izip(chain([0], indices), chain(indices, [None]))
    return (alist[i:j] for i, j in pairs)

and of course, the very, very lazy guy solution (if you don't mind to get arrays instead of lists, but anyway you can always revert them to lists):

import numpy
partition = numpy.split

fortran 2009-07-29 08:55:27

That -1 will cut off the last item. You can use `None` instead to be treated the same as an empty slice part (also for the start, though there it doesn't matter)

Brian 2009-07-29 09:21:43

yes, you're right... My initial version had len(alist) and it worked ok, but I replaced for -1 as it seemed less verbose :-s

fortran 2009-07-29 09:23:39

changed it to None :-)

fortran 2009-07-29 09:25:15

Answer 8

A:

Here's yet another answer.

def partition(l, indexes):
    result, indexes = [], indexes+[len(l)]
    reduce(lambda x, y: result.append(l[x:y]) or y, indexes, 0)
    return result

It supports negative indexes and such.

>>> partition([1,2,3,4,5], [1, -1])
[[1], [2, 3, 4], [5]]
>>>

Steve Losh 2009-07-29 16:31:57

Answer 9

+1 A:

>>> def burst_seq(seq, indices):
...    startpos = 0
...    for index in indices:
...       yield seq[startpos:index]
...       startpos = index
...    yield seq[startpos:]
...
>>> list(burst_seq(range(20), [5, 12, 17]))
[[0, 1, 2, 3, 4], [5, 6, 7, 8, 9, 10, 11], [12, 13, 14, 15, 16], [17, 18, 19]]
>>> list(burst_seq(range(20), []))
[[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19]]
>>> list(burst_seq(range(0), [5, 12, 17]))
[[], [], [], []]
>>>

Maxima mea culpa: it uses a for statement, and it's not using whizzbang stuff like itertools, zip(), None as a sentinel, list comprehensions, ...

;-)

John Machin 2009-07-30 03:32:12

ansaurus

tags:

views:

answers:

Split a list into parts based on a set of indexes in Python

Alternative approach

related questions