views:

220

answers:

7

I have a sumranges() function, which sums all the ranges of consecutive numbers found in a tuple of tuples. To illustrate:

def sumranges(nums):
    return sum([sum([1 for j in range(len(nums[i])) if
                     nums[i][j] == 0 or
                     nums[i][j - 1] + 1 != nums[i][j]]) for
                i in range(len(nums))])

>>> nums = ((1, 2, 3, 4), (1, 5, 6), (19, 20, 24, 29, 400))
>>> print sumranges(nums)
7

As you can see, it returns the number of ranges of consecutive digits within the tuple, that is: len((1, 2, 3, 4), (1), (5, 6), (19, 20), (24), (29), (400)) = 7. The tuples are always ordered.

My problem is that my sumranges() is terrible. I hate looking at it. I'm currently just iterating through the tuple and each subtuple, assigning a 1 if the number is not (1 + previous number), and summing the total. I feel like I am missing a much easier way to accomplish my stated objective. Does anyone know a more pythonic way to do this?

Edit: I have benchmarked all the answers given thus far. Thanks to all of you for your answers.

The benchmarking code is as follows, using a sample size of 100K:

from time import time
from random import randrange
nums = [sorted(list(set(randrange(1, 10) for i in range(10)))) for
        j in range(100000)]

for func in sumranges, alex, matt, redglyph, ephemient, ferdinand:
    start = time()
    result = func(nums)
    end = time()
    print ', '.join([func.__name__, str(result), str(end - start) + ' s'])

Results are as follows. Actual answer shown to verify that all functions return the correct answer:

sumranges, 250281, 0.54171204567 s
alex, 250281, 0.531121015549 s
matt, 250281, 0.843333005905 s
redglyph, 250281, 0.366822004318 s
ephemient, 250281, 0.805964946747 s
ferdinand, 250281, 0.405596971512 s

RedGlyph does edge out in terms of speed, but the simplest answer is probably Ferdinand's, and probably wins for most pythonic.

+8  A: 

Consider:

>>> nums = ((1, 2, 3, 4), (1, 5, 6), (19, 20, 24, 29, 400))
>>> flat = [[(x - i) for i, x in enumerate(tu)] for tu in nums]
>>> print flat
[[1, 1, 1, 1], [1, 4, 4], [19, 19, 22, 26, 396]]
>>> import itertools
>>> print sum(1 for tu in flat for _ in itertools.groupby(tu))
7
>>>

we "flatten" the "increasing ramps" of interest by subtracting the index from the value, turning them into consecutive "runs" of identical values; then we identify and could the "runs" with the precious itertools.groupby. This seems to be a pretty elegant (and speedy) solution to your problem.

Alex Martelli
You have just blown my mind.
Brent Newey
I wonder if `sum(len(list(itertools.groupby(tu))) for tu in flat)` would be faster or slower. than `sum(1 for ...)`.
ephemient
@ephemient, why wonder? Measure! `python -mtimeit` is your friend.
Alex Martelli
Alex, every time you say something I learn something.
Brent Newey
@Brent, tx, always happy to help!-)
Alex Martelli
A: 

This could probably be put together in a more compact form, but I think clarity would suffer:

def pairs(seq):
    for i in range(1,len(seq)):
        yield (seq[i-1], seq[i])

def isadjacent(pair):
    return pair[0]+1 == pair[1]

def sumrange(seq):
    return 1 + sum([1 for pair in pairs(seq) if not isadjacent(pair)])

def sumranges(nums):
    return sum([sumrange(seq) for seq in nums])


nums = ((1, 2, 3, 4), (1, 5, 6), (19, 20, 24, 29, 400))
print sumranges(nums)   # prints 7
Matt Anderson
It would be nice if you benchmark the various proposed solutions and let us know how they perform. That's how people settled on using ''.join(sequence) as a Python idiom, by benchmarking all the different ways that it could be coded.
Michael Dillon
I will sum up the benchmarks in my question. Also, thanks for the answer Matt.
Brent Newey
A: 

You could probably do this better if you had an IntervalSet class because then you would scan through your ranges to build your IntervalSet, then just use the count of set members.

Some tasks don't always lend themselves to neat code, particularly if you need to write the code for performance.

Michael Dillon
+5  A: 

Just to show something closer to your original code:

def sumranges(nums):
    return sum( (1 for i in nums
                   for j, v in enumerate(i)
                   if j == 0 or v != i[j-1] + 1) )

The idea here was to:

  • avoid building intermediate lists but use a generator instead, it will save some resources
  • avoid using indices when you already have selected a subelement (i and v above).

The remaining sum() is still necessary with my example though.

RedGlyph
+13  A: 

My 2 cents:

>>> sum(len(set(x - i for i, x in enumerate(t))) for t in nums)
7

It's basically the same idea as descriped in Alex' post, but using a set instead of itertools.groupby, resulting in a shorter expression. Since sets are implemented in C and len() of a set runs in constant time, this should also be pretty fast.

Ferdinand Beyer
That's really slick.
Brent Newey
Ah, I had missed the "tuples are always ordered" bit in the question -- that bit _does_ make this answer preferable (mine would work for arbitrary tuples, not just ordered ones, but, there is a small price to pay for that generality).
Alex Martelli
A: 

There is a formula for this, the sum of the first n numbers, 1+ 2+ ... + n = n(n+1) / 2 . Then if you want to have the sum of i-j then it is (j(j+1)/2) - (i(i+1)/2) this I am sure simplifies but you can work that out. It might not be pythonic but it is what I would use.

Vincent
Can you explain this a little bit? Keep in mind I am not looking for the sum of the numbers but a count of the consecutive sequences in my tuples.
Brent Newey
Is see, I am answering the wrong question, I miss understood.Why not find the differance of all then keep only 1s and add then or Len() the list. I would show but typing on my phone sucks
Vincent
+1  A: 

Here's my attempt:

def ranges(ls):
    for l in ls:
        consec = False
        for (a,b) in zip(l, l[1:]+(None,)):
            if b == a+1:
                consec = True
            if b is not None and b != a+1:
                consec = False
            if consec:
                yield 1

'''
>>> nums = ((1, 2, 3, 4), (1, 5, 6), (19, 20, 24, 29, 400))
>>> print sum(ranges(nums))
7
'''

It looks at the numbers pairwise, checking if they are a consecutive pair (unless it's at the last element of the list). Each time there's a consecutive pair of numbers it yields 1.

lost-theory
I had to alter this a little to work with my benchmark (most notably by making a list out of the generator expression). It times in at 0.64s. I think using a generator expression is an interesting solution, however.
Brent Newey