ansaurus

Question

Answer 1

A:

Not an algorithm improvement, but in goodSuffixShift, you have an extraneous call to keys().

in bcs  # replaces in bcs.keys()

Gregg Lind 2009-07-09 20:15:46

Answer 2

A:

I suggest you profile your code, locate performance bottlenecks and fix them.

Yuval F 2009-07-09 20:49:16

Answer 3

+3 A:

Using "in bcs.keys()" is creating a list and then doing an O(N) search of the list -- just use "in bcs".

Do the goodSuffixShift(key) thing inside the search function. Two benefits: the caller has only one API to use, and you avoid having bcs as a global (horrid ** 2).

Your indentation is incorrect in several places.

Update

This is not the Boyer-Moore algorithm (which uses TWO lookup tables). It looks more like the Boyer-Moore-Horspool algorithm, which uses only the first BM table.

A probable speedup: add the line 'bcsget = bcs.get' after setting up the bcs dict. Then replace:

if text[j] != key[i]:
    if text[j] not in bcs.keys():
        j += len_key
        i = index
    else:
        j += bcs[text[j]]
        i = index

with:

if text[j] != key[i]:
    j += bcsget(text[j], len_key)
    i = index

Update 2 -- back to basics, like getting the code correct before you optimise

Version 1 has some bugs which you have carried forward into versions 2 and 3. Some suggestions:

Change the not-found response from "not found" to -1. This makes it compatible with text.find(key), which you can use to check your results.

Get some more text values e.g. "R" * 20, "X" * 20, and "XXXSCIENCEYYY" for use with your existing key values.

Lash up a test harness, something like this:

func_list = [searchv1, searchv2, searchv3]
def test():
    for text in text_list:    
        print '==== text is', repr(text)
        for func in func_list:
             for key in key_list:
                try:
                    result = func(text, key)
                except Exception, e:
                    print "EXCEPTION: %r expected:%d func:%s key:%r" % (e, expected, func.__name__, key)
                    continue
                expected = text.find(key)
                if result != expected:
                    print "ERROR actual:%d expected:%d func:%s key:%r" % (result, expected, func.__name__, key)

Run that, fix the errors in v1, carry those fixes forward, run the tests again until they're all OK. Then you can tidy up your timing harness along the same lines, and see what the performance is. Then you can report back here, and I'll give you my idea of what a searchv4 function should look like ;-)

John Machin 2009-07-10 01:37:30

Answer 4

+1 A:

Have you compared to the algorithm Fredrik Lundh developed for Python 2.5?

Filip Salomonsson 2009-07-10 13:09:53

ansaurus

tags:

views:

answers:

improving Boyer-Moore string search

related questions