views:

242

answers:

6

I'd like to identify groups of continuous numbers in a list, so that:

myfunc([2, 3, 4, 5, 12, 13, 14, 15, 16, 17, 20])

returns:

[(2,5), (12,17), 20]

And was wondering what the best way to do this was (particularly if there's something inbuilt into Python).

Edit: Note I originally forgot to mention that individual numbers should be returned as individual numbers, not ranges.

+2  A: 

This doesn't use a standard function - it just iiterates over the input, but it should work:

def myfunc(l):
    r = []
    p = q = None
    for x in l + [-1]:
        if x - 1 == q:
            q += 1
        else:
            if p:
               if q > p:
                   r.append('%s-%s' % (p, q))
               else:
                   r.append(str(p))
            p = q = x
    return '(%s)' % ', '.join(r)

Note that it requires that the input contains only positive numbers in ascending order. You should validate the input, but this code is omitted for clarity.

Mark Byers
+3  A: 

Assuming your list is sorted:

>>> from itertools import groupby
>>> def ranges(lst):
    pos = (j - i for i, j in enumerate(lst))
    t = 0
    for i, els in groupby(pos):
        l = len(list(els))
        el = lst[t]
        t += l
        yield range(el, el+l)


>>> lst = [2, 3, 4, 5, 12, 13, 14, 15, 16, 17]
>>> list(ranges(lst))
[range(2, 6), range(12, 18)]
SilentGhost
@SilentGhost: really beautiful!
telliott99
`[j - i for i, j in enumerate(lst)]` is clever :-)
THC4k
+13  A: 

EDIT 2: To answer the OP new requirement

ranges = []
for key, group in groupby(enumerate(data), lambda (index, item): index - item):
    group = map(itemgetter(1), group)
    if len(group) > 1:
        ranges.append(xrange(group[0], group[-1]))
    else:
        ranges.append(group[0])

Output:

[xrange(2, 5), xrange(12, 17), 20]

You can replace xrange with range or any other custom class.


Python docs have a very neat recipe for this:

from operator import itemgetter
from itertools import groupby
data = [2, 3, 4, 5, 12, 13, 14, 15, 16, 17]
for k, g in groupby(enumerate(data), lambda (i,x):i-x):
    print map(itemgetter(1), g)

Output:

[2, 3, 4, 5]
[12, 13, 14, 15, 16, 17]

If you want to get the exact same output, you can do this:

ranges = []
for k, g in groupby(enumerate(data), lambda (i,x):i-x):
    group = map(itemgetter(1), g)
    ranges.append((group[0], group[-1]))

output:

[(2, 5), (12, 17)]

EDIT: The example is already explained in the documentation but maybe I should explain it more:

The key to the solution is differencing with a range so that consecutive numbers all appear in same group.

If the data was: [2, 3, 4, 5, 12, 13, 14, 15, 16, 17] Then groupby(enumerate(data), lambda (i,x):i-x) is equivalent of the following:

groupby(
    [(0, 2), (1, 3), (2, 4), (3, 5), (4, 12),
    (5, 13), (6, 14), (7, 15), (8, 16), (9, 17)],
    lambda (i,x):i-x
)

The lambda function subtracts the element index from the element value. So when you apply the lambda on each item. You'll get the following keys for groupby:

[-2, -2, -2, -2, -8, -8, -8, -8, -8, -8]

groupby groups elements by equal key value, so the first 4 elements will be grouped together and so forth.

I hope this makes it more readable.

Nadia Alramli
+1 for link to docs.
Mark Byers
almost works in py3k, except it requires `lambda x:x[0]-x[1]`.
SilentGhost
+1 Really very clever. But I guess I'd never understand that if I didn't already know what it's supposed to do. :)
jellybean
Could you use please use multi-character variable names? For someone not familiar with map() or groupby(), the meanings of k g, i and x are not clear.
nailer
This was copied from the Python documentations with the same variable names. I changed the names now.
Nadia Alramli
Thanks for the improved variable names and handling non-ranged numbers. This is readable, your explanations are great and I've marked this as the preferred answer.
nailer
I'm glad I was able to help :)
Nadia Alramli
+3  A: 

The "naive" solution which I find somewhat readable atleast.

x = [2, 3, 4, 5, 12, 13, 14, 15, 16, 17, 22, 25, 26, 28, 51, 52, 57]

def group(L):
    first = last = L[0]
    for n in L[1:]:
        if n - 1 == last: # Part of the group, bump the end
            last = n
        else: # Not part of the group, yield current group and start a new
            yield first, last
            first = last = n
    yield first, last # Yield the last group


>>>print list(group(x))
[(2, 5), (12, 17), (22, 22), (25, 26), (28, 28), (51, 52), (57, 57)]
truppo
I like this answer a lot because it's terse yet readable. However numbers that are outside of ranges should be printed as single digits, not tuples (as I will format the output and have different formatting requirements for individual numbers versus ranges of numbers.
nailer
+1  A: 

Here's the answer I came up with. I'm writing the code for other people to understand, so I'm fairly verbose with variable names and comments.

First a quick helper function:

def getpreviousitem(mylist,myitem):
    '''Given a list and an item, return previous item in list'''
    for position, item in enumerate(mylist):
        if item == myitem:
            # First item has no previous item
            if position == 0:
                return None
            # Return previous item    
            return mylist[position-1] 

And then the actual code:

def getranges(cpulist):
    '''Given a sorted list of numbers, return a list of ranges'''
    rangelist = []
    inrange = False
    for item in cpulist:
        previousitem = getpreviousitem(cpulist,item)
        if previousitem == item - 1:
            # We're in a range
            if inrange == True:
                # It's an existing range - change the end to the current item
                newrange[1] = item
            else:    
                # We've found a new range.
                newrange = [item-1,item]
            # Update to show we are now in a range    
            inrange = True    
        else:   
            # We were in a range but now it just ended
            if inrange == True:
                # Save the old range
                rangelist.append(newrange)
            # Update to show we're no longer in a range    
            inrange = False 
    # Add the final range found to our list
    if inrange == True:
        rangelist.append(newrange)
    return rangelist

Example run:

getranges([2, 3, 4, 5, 12, 13, 14, 15, 16, 17])

returns:

[[2, 5], [12, 17]]
nailer
`>>> getranges([2, 12, 13])` Outputs: `[[12, 13]]`. Was that intentional?
SilentGhost
so, it wasn't. then your code doesn't work.
SilentGhost
Yep, I need to fix for individual numbers (per most of the answers on the page). Working on it now.
nailer
Actually I prefer Nadia's answer, groupby() seems like the standard function I wanted.
nailer
+1  A: 

Here it is something that should work, without any import needed:

def myfunc(lst):
    ret = []
    a = b = lst[0]                           # a and b are range's bounds

    for el in lst[1:]:
        if el == b+1: b = el                 # range grows
        else:                                # range ended
            ret.append(a if a==b else (a,b)) # is a single or a range?
            a = b = el                       # let's start again with a single
    ret.append(a if a==b else (a,b))         # corner case for last single/range
    return ret
Andrea Ambu