String scanning algorithm
If you are looking for a fast algorithm, then you can use the algorithm I developed recently for an interview question that asked for the longest string of consecutive letters in a string. See blog entry here.
def search_longest_substring(s):
"""
>>> search_longest_substring('AABBBBCBBBBACCDDDDDDAAABBBBCBBBBACCDDDDDDDAAABBBBCBBBBACCDDDDDDA')
(7, 'D')
"""
def find_left(s, midc, mid, left):
for j in range(mid-1, left-1, -1):
if s[j] != midc:
return j + 1
return left
def find_right(s, midc, mid, right):
for k in range(mid+1, right):
if s[k] != midc:
return k
return right
i, longest = 0, (0, '')
while i < len(s):
c = s[i]
j = find_left(s, c, i, i-longest[0])
k = find_right(s, c, i, len(s))
if k-j > longest[0]:
longest = (k-j, c)
i = k + longest[0]
return longest
if __name__ == '__main__':
import random
heads_or_tails = "".join(["HT"[random.randrange(0, 2)] for _ in range(20)])
print search_longest_substring(heads_or_tails)
print heads_or_tails
This algorithm is O(n) in worst case (all coin flips are identical) or O(n/m) in average case (where m is the length of the longest match). Feel free to correct me on this.
The code is not especially pythonic (i.e. it does not use list comprehensions or itertools
or other stuff). It's in python and it's a good algorithm.
Micro-optimizations
For the micro-optimization crowd, here are changes that make this really scream in python 2.6 on a Windows Vista laptop:
def find_left(s, midc, mid, left):
j = mid - 1
while j >= 0:
if s[j] != midc:
return j + 1
j -= 1
return left
def find_right(s, midc, mid, right):
k = mid+1
while k < right:
if s[k] != midc:
return k
k += 1
return right
Timing results for 1000 iterations with timeit
:
range: 2.670
xrange: 0.3268
while-loop: 0.255
Adding psyco
import to the file:
try:
import psyco
psyco.full()
except ImportError:
pass
0.011 on 1000 iterations with psyco
and while-loop. So with judicious micros-optimizations and importing psyco
, the code runs 250-ish times faster.