For the Boyer-Moore algorithm to be worst-case linear, the computation of the mis-match table must be O(m). However, a naive implementation would loop through all suffixs O(m) and all positions in that that suffix could go and check for equality... which is O(m3)!
Below is the naive implementation of table building algorithm. So this question becomes: How can I improve this algorithm's runtime to O(m)?
def find(s, sub, no):
n = len(s)
m = len(sub)
for i in range(n, 0, -1):
if s[max(i-m, 0): i] == sub[max(0, m-i):] and \
(i-m < 1 or s[i-m-1] != no):
return n-i
return n
def table(s):
m = len(s)
b = [0]*m
for i in range(m):
b[i] = find(s, s[m-i:], s[m-i-1])
return b
print(table('anpanman'))
To put minds at rest, this isn't homework. I'll add revisions when anyone posts ideas for improvements.