views:

148

answers:

4

What's the easiest way to count the longest consecutive repeat of a certain character in a string? For example, the longest consecutive repeat of "b" in the following string:

my_str = "abcdefgfaabbbffbbbbbbfgbb"

would be 6, since other consecutive repeats are shorter (3 and 2, respectively.) How can I do this in Python?

thanks.

+3  A: 

Here's my really boring, inefficient, straightforward counting method (interjay's is much better). Note, I wrote this in this little text field, which doesn't have an interpreter, so I haven't tested it, and I may have made a really dumb mistake that a proof-read didn't catch.

my_str = "abcdefgfaabbbffbbbbbbfgbb"
last_char = ""
current_seq_len = 0
max_seq_len = 0

for c in mystr:
    if c == last_char:
        current_seq_len += 1
        if current_seq_len > max_seq_len:
            max_seq_len = current_seq_len
    else:
        current_seq_len = 1
        last_char = c

print(max_seq_len)
Josh Wright
You may need to update `last_char` somewhere in the loop; other than that, +1 for providing the really *easiest* way: it's the approach that less concepts/skills requires from the programmer. BTW, it's not "innefficient": any solution will need to look at all characters on the string to provide the correct result, so it's cost will be at least O(n): your approach has a time cost of O(n), so it's decently efficient. A slight efficiency improvement would be to update `max_seq_len` on the `else:` block, so it's updated once per sequence rather than once per character.
herenvardo
Ok, ignore my point about updating `last_char`, Ignacio just fixed it ;)
herenvardo
Thanks Ignacio ;) (I only meant inefficient in the terms of how much typing you have to do)
Josh Wright
I would probably use max() instead of the if, and it doesn't count the longest sequence of "b"'s but of any character (maybe that broke in the ignacio fix though?). Anyway this is much better than the oneliners, IMNSHO.
James Antill
+2  A: 

Here is a one-liner:

max(len(list(y)) for (c,y) in itertools.groupby(my_str) if c=='b')

Explanation:

itertools.groupby will return groups of consecutive identical characters, along with an iterator for all items in that group. For each such iterator, len(list(y)) will give the number of items in the group. Taking the maximum of that (for the given character) will give the required result.

interjay
+2  A: 

How about a regex example:

import re
my_str = "abcdefgfaabbbffbbbbbbfgbb"
len(max(re.compile("(b+b)").findall(my_str)))    
# max([len(i) for i in re.compile("(b+b)").findall(my_str)]) also works

Edit, Mine vs. interjays

x=timeit.Timer(stmt='import itertools;my_str = "abcdefgfaabbbffbbbbbbfgbb";max(len(list(y)) for (c,y) in itertools.groupby(my_str) if c=="b")')
x.timeit()
22.759046077728271

x=timeit.Timer(stmt='import re;my_str = "abcdefgfaabbbffbbbbbbfgbb";len(max(re.compile("(b+b)").findall(my_str)))')
x.timeit()
8.4770550727844238
Mark
+1 for helping to restore partially rehabilitate the value of regexp on this Site--very brave.
doug
+1  A: 

Using run-length encoding:

import numpy as NP

signal = NP.array([4,5,6,7,3,4,3,5,5,5,5,3,4,2,8,9,0,1,2,8,8,8,0,9,1,3])

px, = NP.where(NP.ediff1d(signal) != 0)
px = NP.r_[(0, px+1, [len(signal)])]
# collect the run-lengths for each unique item in the signal
rx = [ (m, n, signal[m]) for (m, n) in zip(px[:-1], px[1:]) if (n - m) > 1 ]

# get longest:
rx2 = [ (b-a, c) for (a, b, c) in rx ]
rx2.sort(reverse=True)

# returns: [(4, 5), (3, 8)], ie, '5' occurs 4 times consecutively, '8' occurs 3 times consecutively 
doug