ansaurus

Question

Counting longest occurence of repeated sequence in Python

Answer 1

+3 A:

Here's my really boring, inefficient, straightforward counting method (interjay's is much better). Note, I wrote this in this little text field, which doesn't have an interpreter, so I haven't tested it, and I may have made a really dumb mistake that a proof-read didn't catch.

my_str = "abcdefgfaabbbffbbbbbbfgbb"
last_char = ""
current_seq_len = 0
max_seq_len = 0

for c in mystr:
    if c == last_char:
        current_seq_len += 1
        if current_seq_len > max_seq_len:
            max_seq_len = current_seq_len
    else:
        current_seq_len = 1
        last_char = c

print(max_seq_len)

Josh Wright 2010-04-18 22:10:42

You may need to update `last_char` somewhere in the loop; other than that, +1 for providing the really *easiest* way: it's the approach that less concepts/skills requires from the programmer. BTW, it's not "innefficient": any solution will need to look at all characters on the string to provide the correct result, so it's cost will be at least O(n): your approach has a time cost of O(n), so it's decently efficient. A slight efficiency improvement would be to update `max_seq_len` on the `else:` block, so it's updated once per sequence rather than once per character.

herenvardo 2010-04-18 22:16:21

Ok, ignore my point about updating `last_char`, Ignacio just fixed it ;)

herenvardo 2010-04-18 22:18:50

Thanks Ignacio ;) (I only meant inefficient in the terms of how much typing you have to do)

Josh Wright 2010-04-18 22:35:54

I would probably use max() instead of the if, and it doesn't count the longest sequence of "b"'s but of any character (maybe that broke in the ignacio fix though?). Anyway this is much better than the oneliners, IMNSHO.

James Antill 2010-04-20 16:59:16

Answer 2

+2 A:

Here is a one-liner:

max(len(list(y)) for (c,y) in itertools.groupby(my_str) if c=='b')

Explanation:

itertools.groupby will return groups of consecutive identical characters, along with an iterator for all items in that group. For each such iterator, len(list(y)) will give the number of items in the group. Taking the maximum of that (for the given character) will give the required result.

interjay 2010-04-18 22:10:51

Answer 3

+2 A:

How about a regex example:

import re
my_str = "abcdefgfaabbbffbbbbbbfgbb"
len(max(re.compile("(b+b)").findall(my_str)))    
# max([len(i) for i in re.compile("(b+b)").findall(my_str)]) also works

Edit, Mine vs. interjays

x=timeit.Timer(stmt='import itertools;my_str = "abcdefgfaabbbffbbbbbbfgbb";max(len(list(y)) for (c,y) in itertools.groupby(my_str) if c=="b")')
x.timeit()
22.759046077728271

x=timeit.Timer(stmt='import re;my_str = "abcdefgfaabbbffbbbbbbfgbb";len(max(re.compile("(b+b)").findall(my_str)))')
x.timeit()
8.4770550727844238

Mark 2010-04-18 22:15:25

+1 for helping to restore partially rehabilitate the value of regexp on this Site--very brave.

doug 2010-04-19 04:29:35

Answer 4

+1 A:

Using run-length encoding:

import numpy as NP

signal = NP.array([4,5,6,7,3,4,3,5,5,5,5,3,4,2,8,9,0,1,2,8,8,8,0,9,1,3])

px, = NP.where(NP.ediff1d(signal) != 0)
px = NP.r_[(0, px+1, [len(signal)])]
# collect the run-lengths for each unique item in the signal
rx = [ (m, n, signal[m]) for (m, n) in zip(px[:-1], px[1:]) if (n - m) > 1 ]

# get longest:
rx2 = [ (b-a, c) for (a, b, c) in rx ]
rx2.sort(reverse=True)

# returns: [(4, 5), (3, 8)], ie, '5' occurs 4 times consecutively, '8' occurs 3 times consecutively

doug 2010-04-18 22:15:31

ansaurus

tags:

views:

answers:

Counting longest occurence of repeated sequence in Python

related questions