tags:

views:

645

answers:

9

What is the best way to split a list into parts based on an arbitrary number of indexes? E.g. given the code below

indexes = [5, 12, 17]
list = range(20)

return something like this

part1 = list[:5]
part2 = list[5:12]
part3 = list[12:17]
part4 = list[17:]

If there are no indexes it should return the entire list.

+2  A: 

I would be interested in seeing a more Pythonic way of doing this also. But this is a crappy solution. You would need to add checking for an empry index list.

Something along the lines of:

indexes = [5, 12, 17]
list = range(20)

output = []
prev = 0

for index in indexes:
    output.append(list[prev:index])
    prev = index

output.append(list[indexes[-1]:])

print output

produces

[[0, 1, 2, 3, 4], [5, 6, 7, 8, 9, 10, 11], [12, 13, 14, 15, 16], [17, 18, 19]]
kjfletch
Not very happy with 2 down votes for throwing ideas in the pot. Especially with no comments. I stated the rough nature of the solution.
kjfletch
This site shouldn't allow downvoting without an explanation. It's not useful at all.
Glenn Maynard
This answer is perfect, and what I or anyone would write if faced with this problem in the real world. The question was asking specifically for a "Pythonic" way, which I think some of the other answers adress better than this one.
anthony
That's why I hate the buzzword "Pythonic". It's as if everything written in Python should be written in some special Python-specific way, preferably forcibly crushed onto a single line for some reason.
Glenn Maynard
In my opinion, "pythonic" just means good, idiomatic style. It does not mean hypercondensed one-line solutions that show off every python feature. This looks perfectly pythonic to me. It uses slices appropriately, uses range when range is more appropriate than xrange, and iterates directly over the list rather than looping through the indices. Pythonic? Check. Comprehensible? Check. Accurate? Check. +1
jcdyer
Oh, and in python 2, you can use prev after the for loop exits, so you can replace `output.append(list[indexes[-1]:])` with `output.append(list[prev:])`.
jcdyer
A: 

This is all that I could think of

def partition(list_, indexes):
    if indexes[0] != 0:
     indexes = [0] + indexes
    if indexes[-1] != len(list_):
     indexes = indexes + [len(list_)]
    return [ list_[a:b] for (a,b) in zip(indexes[:-1], indexes[1:])]
Il-Bhima
Clever solution, very concise =) I'll give my +1 to kjfletch's answer though, because it reuses existing values, while this method does a lot of list creation/modification.
Blixt
It'd be more consistent to drop the conditionals--if the first index is 0, the first item should be empty. Just use `indexes = [0] + indexes + [None]`.
Glenn Maynard
I like this solution. Looks clever.
kjfletch
Also, it's probably better off with itertools.izip instead of zip, and itertools.islice instead of direct slicing.
Glenn Maynard
@Glenn Hmm, I was actually trying to avoid having those empty lists at the start and end. Not sure if that was want the original poster wanted
Il-Bhima
don't overwrite a built-in variable (list)
dalloliogm
+5  A: 

My solution is similar to Il-Bhima's.

>>> def parts(list_, indices):
...     indices = [0]+indices+[len(list_)]
...     return [list_[v:indices[k+1]] for k, v in enumerate(indices[:-1])]

Alternative approach

If you're willing to slightly change the way you input indices, from absolute indices to relative (that is, from [5, 12, 17] to [5, 7, 5], the below will also give you the desired output, while it doesn't create intermediary lists.

>>> from itertools import islice
>>> def parts(list_, indices):
...     i = iter(list_)
...     return [list(islice(i, n)) for n in chain(indices, [None])]
Cide
`indicies` -> `indices`
Blixt
+1 for shortness and for reusing values (`enumerate` iterates the existing array, as opposed to `zip`)
Blixt
As Blixt pointed out I think you meant indices instead of indicies. Then you have the minor issue when passing indices like [0,5,12,17], in which case your result will contain the empty list list_[0:0]
Il-Bhima
@Il-Bhima: Which could be argued is correct, since the first part is then of length 0, which is consistent with the OP's example.
Cide
@Il-Bhima, I think the intended behavior of passing in "split at index 0" would be to get an empty array as the first split value. Personally, I hate "magic" behavior that changes based on the parameters.
Blixt
@Cide, Blixt. Fair enough.
Il-Bhima
I updated this version to not create new lists (other than the slices necessary) while improving simplicity.
Cide
This got downvoted why? Please leave comments if you think this is wrong or otherwise inappropriate, thanks :). I tested it pretty thoroughly so I think it covers all the edge cases as well.
Cide
This is unintuitive code, and it doesn't work (compare the output to the other answers). Also, don't replace your answer with a completely different one after people have already voted and commented on the original.
Glenn Maynard
You might want to give people a chance to *write* a comment before you complain about not receiving one.
Glenn Maynard
@Glenn Maynard: Thank you; you're perfectly right. I'll keep that in mind from now on.
Cide
@Cide: Why did you decide against this if you don't mind me asking? http://stackoverflow.com/revisions/0a518729-d726-47b5-b62b-c674e37849ef/view-source
drjeep
It's there now, but its functionality is slightly different from whta the OP asked for. It's detailed in the "Alternative approach" in the post now.
Cide
A: 

The plural of index is indices. Going for simplicity/readability.

indices = [5, 12, 17]
input = range(20)
output = []

for i in reversed(indices):
    output.append(input[i:])
    input[i:] = []
output.append(input)

while len(output):
    print output.pop()
anthony
"Indexes" and "indices" are both correct. "Indexes" is a pluralization of "index" and is more common in the US, while "indices" is derived from Latin and is more common in the UK.
Blixt
No, "indexes" is also a perfectly correct word.
Glenn Maynard
Did anyone want to discuss the viability of working right to left? or is this GrammarOverflow now?
anthony
Working right to left would imply you either insert at other locations than the end or reverse the list before returning it. Either case is less than ideal.
Cide
-1 It mutates the input ... aarrgghh!!
John Machin
A: 

Cide's makes three copies of the array: [0]+indices copies, ([0]+indices)+[] copies again, and indices[:-1] will copy a third time. Il-Bhima makes five copies. (I'm not counting the return value, of course.)

Those could be reduced (izip, islice), but here's a zero-copy version:

def iterate_pairs(lst, indexes):
    prev = 0
    for i in indexes:
        yield prev, i
        prev = i
    yield prev, len(lst)

def partition(lst, indexes):
    for first, last in iterate_pairs(lst, indexes):
        yield lst[first:last]

indexes = [5, 12, 17]
lst = range(20)

print [l for l in partition(lst, indexes)]

Of course, array copies are fairly cheap (native code) compared to interpreted Python, but this has another advantage: it's easy to reuse, to mutate the data directly:

for first, last in iterate_pairs(lst, indexes):
    for i in range(first, last):
        lst[i] = first
print lst
# [0, 0, 0, 0, 0, 5, 5, 5, 5, 5, 5, 5, 12, 12, 12, 12, 12, 17, 17, 17]

(That's why I passed indexes to iterate_pairs. If you don't care about that, you can remove that parameter and just have the final line be "yield prev, None", which is all partition() needs.)

Glenn Maynard
A: 
indices = [5, 12, 17]
input = range(20)
output = []

reduce(lambda x, y: output.append(input[x:y]) or y, indices + [len(input)], 0)
print output
anthony
+6  A: 

This is the simplest and most pythonic solution I can think of:

def partition(alist, indices):
    return [alist[i:j] for i, j in zip([0]+indices, indices+[None])]

if the inputs are very large, then the iterators solution should be more convenient:

from itertools import izip, chain
def partition(alist, indices):
    pairs = izip(chain([0], indices), chain(indices, [None]))
    return (alist[i:j] for i, j in pairs)

and of course, the very, very lazy guy solution (if you don't mind to get arrays instead of lists, but anyway you can always revert them to lists):

import numpy
partition = numpy.split
fortran
That -1 will cut off the last item. You can use `None` instead to be treated the same as an empty slice part (also for the start, though there it doesn't matter)
Brian
yes, you're right... My initial version had len(alist) and it worked ok, but I replaced for -1 as it seemed less verbose :-s
fortran
changed it to None :-)
fortran
A: 

Here's yet another answer.

def partition(l, indexes):
    result, indexes = [], indexes+[len(l)]
    reduce(lambda x, y: result.append(l[x:y]) or y, indexes, 0)
    return result

It supports negative indexes and such.

>>> partition([1,2,3,4,5], [1, -1])
[[1], [2, 3, 4], [5]]
>>>
Steve Losh
+1  A: 
>>> def burst_seq(seq, indices):
...    startpos = 0
...    for index in indices:
...       yield seq[startpos:index]
...       startpos = index
...    yield seq[startpos:]
...
>>> list(burst_seq(range(20), [5, 12, 17]))
[[0, 1, 2, 3, 4], [5, 6, 7, 8, 9, 10, 11], [12, 13, 14, 15, 16], [17, 18, 19]]
>>> list(burst_seq(range(20), []))
[[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19]]
>>> list(burst_seq(range(0), [5, 12, 17]))
[[], [], [], []]
>>>

Maxima mea culpa: it uses a for statement, and it's not using whizzbang stuff like itertools, zip(), None as a sentinel, list comprehensions, ...

;-)

John Machin