views:

357

answers:

9

Count the longest sequence of heads and tails in 200 coin flips.
I did this - is there a niftier way to do it in python? (without being too obfuscated)

import random

def toss(n):
    count = [0,0]
    longest = [0,0]
    for i in xrange(n):
        coinface = random.randrange(2)
        count[coinface] += 1
        count[not coinface] = 0

        if count[coinface] > longest[coinface]:
            longest[coinface] = count[coinface]      
        #print coinface, count, longest  

    print "longest sequence heads %d, tails %d" %tuple(longest)

if __name__ == '__main__':
    toss(200)

see this for what prompted my playing

+1  A: 

It is probably an axiom that any code can be made more succinct. Yours looks perfectly pythonic, though.

Actually, on reflection perhaps there is no succinctness axiom like that. If succinct means "marked by compact precise expression without wasted words," and if by "words" we mean words of code and not of memory, then a single word program cannot be made more succinct (unless, perhaps, it is the "exit" program).

If pythonic means "of extraordinary size and power", then it seems antagonistic to succinctness unless we restrict our definition to power only. I'm not convinced your program resembles a prophetic oracle at all, although you might implement it as an ascii portrait of a particular prophetic oracle. It doesn't look like a snake, so there's room for improvement there too.

import random

def toss(n):
    '''
     ___     ____________
<<<((__O\   (__<>___<>__ \   ____
       \ \_(__<>___<>__)\O\_/O___>-<  hiss
        \O__<>___<>___<>)\___/

    '''
    count = [0,0]
    longest = [0,0]
    for i in xrange(n):
        coinface = random.randrange(2)
        count[coinface] += 1
        count[not coinface] = 0

        if count[coinface] > longest[coinface]:
            longest[coinface] = count[coinface]
        #print coinface, count, longest

    print "longest sequence heads %d, tails %d" %tuple(longest)

if __name__ == '__main__':
    toss(200)

Nifty, huh?

Ewan Todd
okay, thank you
Steve D
pythonic means: http://docs.python.org/glossary#term-pythonicas well as `import this`.
SilentGhost
Hissterical I thought
Steve D
+2  A: 
>>> def toss(count):
        result = []
        for i in range(count):
            result.append("HT"[random.randrange(0, 2)])
        return ''.join(result)

>>> s = toss(200)
>>> h_max = max(len(x) for x in s.split("T"))
>>> t_max = max(len(x) for x in s.split("H"))
>>> print h_max, t_max
4 6
Robert Rossney
interesting, I like it. thanks.
Steve D
Seriously, why is this getting voted up? I wish I could vote it down twice.
Jed Smith
"Simple is better than complex."
Robert Rossney
+2  A: 

This isn't really pythonic so much as tortured, but here's a short version (with meaningless 1-character variable names, no less!)

import random
x = ''.join([chr(random.randrange(2)) for i in range(200)])
print max([len(s) for s in x.split(chr(0)) + x.split(chr(1))])
Grumdrig
ok, I think that's giving me the max of _either_ heads or tails' seq count right? thanks.
Steve D
Yep. Sorry. Slightly misread the question, but easily corrected.
Grumdrig
+7  A: 

You can use itertools, which is a much more Pythonic way to do this:

def toss(n):
    rolls = [random.randrange(2) for i in xrange(n)]
    maximums = [0, 0]
    for which, grp in itertools.groupby(rolls):
        maximums[which] = max(len(list(grp)), maximums[which])

    print "Longest sequence of heads %d, tails %d" % tuple(maximums)
Jed Smith
Thanks Jed, this is looking better, I need to look into itertools a bit more to follow this. Many thanks
Steve D
+11  A: 
def coins(num):
    lst = [random.randrange(2) for i in range(num)]
    lst = [(i, len(list(j))) for i, j in itertools.groupby(lst)]
    tails = max(j for i, j in lst if i)
    heads = max(j for i, j in lst if not i)
    return {1: tails, 0: heads}
SilentGhost
Thank God someone else used `itertools` instead of voting up the string-splitting and joining horror that currently leads.
Jed Smith
For really large values of num, might be better to use a generator instead of actually creating the whole list.
Emil
@Emil: while I appreciate your suggestion, I'd point out that 1) it's a list of `1`s and `0`s; 2) an intermediate lists are going to take far more memory
SilentGhost
Mine doesnt use a list na na. Anyway I'm going for this one as it's the kind of thing I was looking for. I'm amazed at how many different solutions cropped up despite 'There should be one-- and preferably only one --obvious way to do it.'Thanks all, I need to get an openid so I can vote etc next time.CheersSteve
Steve D
+1: Very legible! I also like Alex's response, which is very memory efficient, but which requires more effort to understand.
EOL
+3  A: 

Another inefficient solution :-)

import random, re
s = ''.join(str(random.randrange(2)) for c in range(10))
print s
print max(re.findall(r'0+', s))
print max(re.findall(r'1+', s))

>>> 
0011100100
00
111
>>>
Nick D
If you're only looking for one set of runs, that's not inefficient at all.
Robert Rossney
like it though, thank you
Steve D
+1  A: 
import random, itertools

def toss(n):
    faces = (random.randrange(2) for i in range(n))
    longest = [0, 0]
    for face, seq in itertools.groupby(faces):
        longest[face] = max(longest[face], len(list(seq)))
    print "longest sequence heads %d, tails %d" % tuple(longest)
Denis Otkidach
looks about the same as Jed's above which I also like, thanks
Steve D
@Steve: There is a difference. I've kept the function memory efficient as original (in question) was, so it can be used for very large `n`, while Jed's and SilentGhost's solutions create large intermediate arrays.
Denis Otkidach
+11  A: 
import collections, itertools, random

def makesequence(choices=2, length=200):
  return [random.randrange(choices) for _ in itertools.repeat(None, length)]

def runlengths(sequence):
  runlength_by_item = collections.defaultdict(set)
  for key, group in itertools.groupby(sequence):
    runlength_by_item[key].add(sum(1 for _ in group))
  return dict((k, max(v)) for k, v in runlength_by_item.items())

As you'll notice, this is much more "decoupled" -- runlengths is a completely general way to determine the maximal run-lengths of different hashable items in any iterable (highly reusable if you need such run-lengths in a variety of different contexts), just as makesequence is a completely general way to make a list of random numbers given list length and number of choices for each random number. Putting these two together may not offer an optimal point-solution to a given, highly specific problem, but it will come close, and building up your little library of reusable "building blocks" will have much higher longer-term returns than just solving each specific problem by entirely dedicated code.

Alex Martelli
It was actually your answers to questions that involve `itertools` that introduced me to the concept. I'm glad you bang that drum repeatedly.
Jed Smith
Can you use len(group) isntead of sum(1 for _ in group)?
Emil
I'm learning stuff today, especially how much I don't know! Thanks for this
Steve D
@Emil: `group` is a `itertools._grouper` object, that doesn't have `len`, alternative would be to convert it to `list` first.
SilentGhost
Dude, this is so YAGNI - way too general, and made understanding this more difficult than it should have.
abyx
I agree about the YAGNI, nonetheless I'm upvoting this for the breathtaking use of `itertools.repeat`. Though if you're going to go that far, why not return a generator instead of a list?
Robert Rossney
@Robert Rossney - breathtaking use? What about useless? a range(length) would have had the same effect, without people having to lookup whether repeat() actually does something more.
abyx
It's a generator-based approach to iterating over a range. That's good to know about. Even better in this case, though, would be `itertools.imap(random.randrange, itertools.repeat(choices, length))`.
Robert Rossney
Wow! very memory efficient!
EOL
@abyx, learning itertools and using it wisely is one of the key secrets to great scalability in Python: unless you're sure you're writing code that nobody will ever particularly care for, making it scalable's a good precaution -- and not at all complicated; those "people" who previously ignored itertools and are "forced" to look it up actually are the great winners here, because they're getting exposed to an awesome, very practical tool; see @Jed's comment. @Robert, yes, pushing "streamingness" up one notch to use imap is quite possible here, too.
Alex Martelli
+1  A: 

String scanning algorithm

If you are looking for a fast algorithm, then you can use the algorithm I developed recently for an interview question that asked for the longest string of consecutive letters in a string. See blog entry here.

def search_longest_substring(s):
    """
    >>> search_longest_substring('AABBBBCBBBBACCDDDDDDAAABBBBCBBBBACCDDDDDDDAAABBBBCBBBBACCDDDDDDA')
    (7, 'D')
    """
    def find_left(s, midc, mid, left):
        for j in range(mid-1, left-1, -1):
            if s[j] != midc:
                return j + 1
        return left
    def find_right(s, midc, mid, right):
        for k in range(mid+1, right):
            if s[k] != midc:
                return k
        return right
    i, longest = 0, (0, '')
    while i < len(s):
        c = s[i]
        j = find_left(s, c, i, i-longest[0])
        k = find_right(s, c, i, len(s))
        if k-j > longest[0]:
            longest = (k-j, c)
        i = k + longest[0]
    return longest

if __name__ == '__main__':
    import random
    heads_or_tails = "".join(["HT"[random.randrange(0, 2)] for _ in range(20)])
    print search_longest_substring(heads_or_tails)
    print heads_or_tails

This algorithm is O(n) in worst case (all coin flips are identical) or O(n/m) in average case (where m is the length of the longest match). Feel free to correct me on this.

The code is not especially pythonic (i.e. it does not use list comprehensions or itertools or other stuff). It's in python and it's a good algorithm.

Micro-optimizations

For the micro-optimization crowd, here are changes that make this really scream in python 2.6 on a Windows Vista laptop:

def find_left(s, midc, mid, left):
    j = mid - 1
    while j >= 0:
        if s[j] != midc:
            return j + 1
        j -=  1
    return left
def find_right(s, midc, mid, right):
    k = mid+1
    while k < right:
        if s[k] != midc:
            return k
        k += 1
    return right

Timing results for 1000 iterations with timeit:

range: 2.670
xrange: 0.3268
while-loop: 0.255

Adding psyco import to the file:

try:
    import psyco
    psyco.full()
except ImportError:
    pass

0.011 on 1000 iterations with psyco and while-loop. So with judicious micros-optimizations and importing psyco, the code runs 250-ish times faster.

hughdbrown
thanks for a reusable-type option
Steve D
BTW, you might want to change those two range calls to xrange calls (Python 2, of course). As the length of the trial sequences increases, all those extra list creations start to add up. With a length of 2000, xrange version ran ~10 times faster than range version (Python 2.6).
Ned Deily
@Ned: Thanks. If I were going to do that, I'd explore using a while loop. Maybe I'll look into that after my first up-vote.
hughdbrown
The code you posted here is different than the one on your blog post. For one thing, it doesn't give the right answer for the test set you used in your post. This returns (6, 'D') and the blog code returns (7, 'D'), which is correct.
Andrew Dalke
@dalke: Yes, that's the risk you run when you try to optimize in public...
hughdbrown
@dalke: fixed that. Added doctest.
hughdbrown