views:

321

answers:

1

For the Boyer-Moore algorithm to be worst-case linear, the computation of the mis-match table must be O(m). However, a naive implementation would loop through all suffixs O(m) and all positions in that that suffix could go and check for equality... which is O(m3)!

Below is the naive implementation of table building algorithm. So this question becomes: How can I improve this algorithm's runtime to O(m)?

def find(s, sub, no):
    n = len(s)
    m = len(sub)

    for i in range(n, 0, -1):
     if s[max(i-m, 0): i] == sub[max(0, m-i):] and \
      (i-m < 1 or s[i-m-1] != no):
      return n-i

    return n

def table(s):
    m = len(s)
    b = [0]*m

    for i in range(m):
     b[i] = find(s, s[m-i:], s[m-i-1])

    return b

print(table('anpanman'))

To put minds at rest, this isn't homework. I'll add revisions when anyone posts ideas for improvements.

+1  A: 

The code under "Preprocessing for the good-suffix heuristics" on this page builds the good-suffix table in O(n) time. It also explains how the code works.

marcog
Thanks. Mission accomplished.