ansaurus

Question

Is there a more succinct / pythonic way to do this? (counting longest seq of heads, tails in coin flips)

Answer 1

+1 A:

It is probably an axiom that any code can be made more succinct. Yours looks perfectly pythonic, though.

Actually, on reflection perhaps there is no succinctness axiom like that. If succinct means "marked by compact precise expression without wasted words," and if by "words" we mean words of code and not of memory, then a single word program cannot be made more succinct (unless, perhaps, it is the "exit" program).

If pythonic means "of extraordinary size and power", then it seems antagonistic to succinctness unless we restrict our definition to power only. I'm not convinced your program resembles a prophetic oracle at all, although you might implement it as an ascii portrait of a particular prophetic oracle. It doesn't look like a snake, so there's room for improvement there too.

import random

def toss(n):
    '''
     ___     ____________
<<<((__O\   (__<>___<>__ \   ____
       \ \_(__<>___<>__)\O\_/O___>-<  hiss
        \O__<>___<>___<>)\___/

    '''
    count = [0,0]
    longest = [0,0]
    for i in xrange(n):
        coinface = random.randrange(2)
        count[coinface] += 1
        count[not coinface] = 0

        if count[coinface] > longest[coinface]:
            longest[coinface] = count[coinface]
        #print coinface, count, longest

    print "longest sequence heads %d, tails %d" %tuple(longest)

if __name__ == '__main__':
    toss(200)

Nifty, huh?

Ewan Todd 2009-11-06 17:32:48

okay, thank you

Steve D 2009-11-06 17:39:50

pythonic means: http://docs.python.org/glossary#term-pythonicas well as `import this`.

SilentGhost 2009-11-06 18:07:42

Hissterical I thought

Steve D 2009-11-06 18:11:30

Answer 2

+2 A:

>>> def toss(count):
        result = []
        for i in range(count):
            result.append("HT"[random.randrange(0, 2)])
        return ''.join(result)

>>> s = toss(200)
>>> h_max = max(len(x) for x in s.split("T"))
>>> t_max = max(len(x) for x in s.split("H"))
>>> print h_max, t_max
4 6

Robert Rossney 2009-11-06 17:34:25

interesting, I like it. thanks.

Steve D 2009-11-06 17:41:17

Seriously, why is this getting voted up? I wish I could vote it down twice.

Jed Smith 2009-11-06 17:49:29

"Simple is better than complex."

Robert Rossney 2009-11-06 17:56:36

Answer 3

+2 A:

This isn't really pythonic so much as tortured, but here's a short version (with meaningless 1-character variable names, no less!)

import random
x = ''.join([chr(random.randrange(2)) for i in range(200)])
print max([len(s) for s in x.split(chr(0)) + x.split(chr(1))])

Grumdrig 2009-11-06 17:38:39

ok, I think that's giving me the max of _either_ heads or tails' seq count right? thanks.

Steve D 2009-11-06 17:46:06

Yep. Sorry. Slightly misread the question, but easily corrected.

Grumdrig 2009-11-07 21:28:34

Answer 4

+7 A:

You can use itertools, which is a much more Pythonic way to do this:

def toss(n):
    rolls = [random.randrange(2) for i in xrange(n)]
    maximums = [0, 0]
    for which, grp in itertools.groupby(rolls):
        maximums[which] = max(len(list(grp)), maximums[which])

    print "Longest sequence of heads %d, tails %d" % tuple(maximums)

Jed Smith 2009-11-06 17:39:51

Thanks Jed, this is looking better, I need to look into itertools a bit more to follow this. Many thanks

Steve D 2009-11-06 17:51:55

Answer 5

+11 A:

def coins(num):
    lst = [random.randrange(2) for i in range(num)]
    lst = [(i, len(list(j))) for i, j in itertools.groupby(lst)]
    tails = max(j for i, j in lst if i)
    heads = max(j for i, j in lst if not i)
    return {1: tails, 0: heads}

SilentGhost 2009-11-06 17:42:00

Thank God someone else used `itertools` instead of voting up the string-splitting and joining horror that currently leads.

Jed Smith 2009-11-06 17:44:24

For really large values of num, might be better to use a generator instead of actually creating the whole list.

Emil 2009-11-06 17:57:16

@Emil: while I appreciate your suggestion, I'd point out that 1) it's a list of `1`s and `0`s; 2) an intermediate lists are going to take far more memory

SilentGhost 2009-11-06 18:02:12

Mine doesnt use a list na na. Anyway I'm going for this one as it's the kind of thing I was looking for. I'm amazed at how many different solutions cropped up despite 'There should be one-- and preferably only one --obvious way to do it.'Thanks all, I need to get an openid so I can vote etc next time.CheersSteve

Steve D 2009-11-06 18:38:04

+1: Very legible! I also like Alex's response, which is very memory efficient, but which requires more effort to understand.

EOL 2009-11-06 20:45:10

Answer 6

+3 A:

Another inefficient solution :-)

import random, re
s = ''.join(str(random.randrange(2)) for c in range(10))
print s
print max(re.findall(r'0+', s))
print max(re.findall(r'1+', s))

>>> 
0011100100
00
111
>>>

Nick D 2009-11-06 17:45:53

If you're only looking for one set of runs, that's not inefficient at all.

Robert Rossney 2009-11-06 17:59:13

like it though, thank you

Steve D 2009-11-06 18:06:53

Answer 7

+1 A:

import random, itertools

def toss(n):
    faces = (random.randrange(2) for i in range(n))
    longest = [0, 0]
    for face, seq in itertools.groupby(faces):
        longest[face] = max(longest[face], len(list(seq)))
    print "longest sequence heads %d, tails %d" % tuple(longest)

Denis Otkidach 2009-11-06 17:46:21

looks about the same as Jed's above which I also like, thanks

Steve D 2009-11-06 18:04:12

@Steve: There is a difference. I've kept the function memory efficient as original (in question) was, so it can be used for very large `n`, while Jed's and SilentGhost's solutions create large intermediate arrays.

Denis Otkidach 2009-11-09 08:35:41

Answer 8

+11 A:

import collections, itertools, random

def makesequence(choices=2, length=200):
  return [random.randrange(choices) for _ in itertools.repeat(None, length)]

def runlengths(sequence):
  runlength_by_item = collections.defaultdict(set)
  for key, group in itertools.groupby(sequence):
    runlength_by_item[key].add(sum(1 for _ in group))
  return dict((k, max(v)) for k, v in runlength_by_item.items())

As you'll notice, this is much more "decoupled" -- runlengths is a completely general way to determine the maximal run-lengths of different hashable items in any iterable (highly reusable if you need such run-lengths in a variety of different contexts), just as makesequence is a completely general way to make a list of random numbers given list length and number of choices for each random number. Putting these two together may not offer an optimal point-solution to a given, highly specific problem, but it will come close, and building up your little library of reusable "building blocks" will have much higher longer-term returns than just solving each specific problem by entirely dedicated code.

Alex Martelli 2009-11-06 17:47:11

It was actually your answers to questions that involve `itertools` that introduced me to the concept. I'm glad you bang that drum repeatedly.

Jed Smith 2009-11-06 17:52:34

Can you use len(group) isntead of sum(1 for _ in group)?

Emil 2009-11-06 17:58:13

I'm learning stuff today, especially how much I don't know! Thanks for this

Steve D 2009-11-06 18:00:59

@Emil: `group` is a `itertools._grouper` object, that doesn't have `len`, alternative would be to convert it to `list` first.

SilentGhost 2009-11-06 18:04:09

Dude, this is so YAGNI - way too general, and made understanding this more difficult than it should have.

abyx 2009-11-06 18:44:50

I agree about the YAGNI, nonetheless I'm upvoting this for the breathtaking use of `itertools.repeat`. Though if you're going to go that far, why not return a generator instead of a list?

Robert Rossney 2009-11-06 19:38:41

@Robert Rossney - breathtaking use? What about useless? a range(length) would have had the same effect, without people having to lookup whether repeat() actually does something more.

abyx 2009-11-06 20:01:20

It's a generator-based approach to iterating over a range. That's good to know about. Even better in this case, though, would be `itertools.imap(random.randrange, itertools.repeat(choices, length))`.

Robert Rossney 2009-11-06 20:20:23

Wow! very memory efficient!

EOL 2009-11-06 20:41:30

@abyx, learning itertools and using it wisely is one of the key secrets to great scalability in Python: unless you're sure you're writing code that nobody will ever particularly care for, making it scalable's a good precaution -- and not at all complicated; those "people" who previously ignored itertools and are "forced" to look it up actually are the great winners here, because they're getting exposed to an awesome, very practical tool; see @Jed's comment. @Robert, yes, pushing "streamingness" up one notch to use imap is quite possible here, too.

Alex Martelli 2009-11-06 23:24:16

Answer 9

+1 A:

String scanning algorithm

If you are looking for a fast algorithm, then you can use the algorithm I developed recently for an interview question that asked for the longest string of consecutive letters in a string. See blog entry here.

def search_longest_substring(s):
    """
    >>> search_longest_substring('AABBBBCBBBBACCDDDDDDAAABBBBCBBBBACCDDDDDDDAAABBBBCBBBBACCDDDDDDA')
    (7, 'D')
    """
    def find_left(s, midc, mid, left):
        for j in range(mid-1, left-1, -1):
            if s[j] != midc:
                return j + 1
        return left
    def find_right(s, midc, mid, right):
        for k in range(mid+1, right):
            if s[k] != midc:
                return k
        return right
    i, longest = 0, (0, '')
    while i < len(s):
        c = s[i]
        j = find_left(s, c, i, i-longest[0])
        k = find_right(s, c, i, len(s))
        if k-j > longest[0]:
            longest = (k-j, c)
        i = k + longest[0]
    return longest

if __name__ == '__main__':
    import random
    heads_or_tails = "".join(["HT"[random.randrange(0, 2)] for _ in range(20)])
    print search_longest_substring(heads_or_tails)
    print heads_or_tails

This algorithm is O(n) in worst case (all coin flips are identical) or O(n/m) in average case (where m is the length of the longest match). Feel free to correct me on this.

The code is not especially pythonic (i.e. it does not use list comprehensions or itertools or other stuff). It's in python and it's a good algorithm.

Micro-optimizations

For the micro-optimization crowd, here are changes that make this really scream in python 2.6 on a Windows Vista laptop:

def find_left(s, midc, mid, left):
    j = mid - 1
    while j >= 0:
        if s[j] != midc:
            return j + 1
        j -=  1
    return left
def find_right(s, midc, mid, right):
    k = mid+1
    while k < right:
        if s[k] != midc:
            return k
        k += 1
    return right

Timing results for 1000 iterations with timeit:

range: 2.670
xrange: 0.3268
while-loop: 0.255

Adding psyco import to the file:

try:
    import psyco
    psyco.full()
except ImportError:
    pass

0.011 on 1000 iterations with psyco and while-loop. So with judicious micros-optimizations and importing psyco, the code runs 250-ish times faster.

hughdbrown 2009-11-06 18:06:26

thanks for a reusable-type option

Steve D 2009-11-06 18:22:20

BTW, you might want to change those two range calls to xrange calls (Python 2, of course). As the length of the trial sequences increases, all those extra list creations start to add up. With a length of 2000, xrange version ran ~10 times faster than range version (Python 2.6).

Ned Deily 2009-11-06 18:48:15

@Ned: Thanks. If I were going to do that, I'd explore using a while loop. Maybe I'll look into that after my first up-vote.

hughdbrown 2009-11-06 19:02:42

The code you posted here is different than the one on your blog post. For one thing, it doesn't give the right answer for the test set you used in your post. This returns (6, 'D') and the blog code returns (7, 'D'), which is correct.

Andrew Dalke 2009-11-07 02:08:10

@dalke: Yes, that's the risk you run when you try to optimize in public...

hughdbrown 2009-11-07 02:44:36

@dalke: fixed that. Added doctest.

hughdbrown 2009-11-07 03:00:49

ansaurus

tags:

views:

answers:

Is there a more succinct / pythonic way to do this? (counting longest seq of heads, tails in coin flips)

String scanning algorithm

Micro-optimizations

related questions