views:

89

answers:

2

Hi all,

This is the exercise in "Introduction to The Design and Analysis of Algorithms". It's a string matching issue. Say I have string ABCD, and have a pattern XY. And want to see if the string contains the pattern.

We just assume to use brute-force here, so the left-to-right comparison is comparing A with X, next is comparing B with X, etc. While right-to-left comparison is comparing B with Y, next is comparing C with B. The hint says right-to-left comparison does have the advantage, but I don't see why.

Any hint is appreciate!

Thanks Tianzhou

+9  A: 

Yes.

See also


As an extreme example, consider if we need to find the pattern ABCD in the text 12345678.

The earliest possible match of course starts at the beginning of the text. We try to match the pattern backward, so we see if we can match D with the 4th character of the text.

   ?    
12345678
ABCD

This is not a match, so we know there's no point in trying to match ABC with the first 3 characters. We also know (after linear time pre-processing) that the character we find, 4, doesn't appear in the pattern at all, so the earliest possible match we can find must start at the next position, i.e. 5th character.

Again we try to match backward, so we see if we can match D with the 8th character.

       ? 
12345678
    ABCD

We find 8; this is not a match. Therefore the pattern doesn't appear in the text. We only needed to see 2 characters from the text.

This is one important characteristics of the Boyer-Moore algorithm: for a text of length N and a fixed pattern of length M, the average-case performance is N/M comparison. That is, perhaps somewhat counterintuitively at first, the longer the pattern we are looking for, the faster we can usually find it.

polygenelubricants
Hey, that's not brute force! :D
Dimitris Andreou
using @polygenelubricants answer, think of the speed increase when searching/trying to match a domain, email address, phone number, etc
Don
But KMP has the same efficiency as Boyer–Moore's. And KMP starts from left-to-right. So I think here the discussion is constrained under Brute force as both Dimitris and I mentioned before.
Tianzhou Chen
+1 for an exceptionally clear description of the principles behind Boyer-Moore.
Nick Johnson
@Dimitris: but the algorithm is simple and well-documented (simple != obvious ;) ) so I think it qualifies :) Note that you could perfectly do it the other way around. Start from the right and process the pattern left-to-right... it just doesn't matter...
Matthieu M.
@Matthieu, ok, answer honestly, how many "brute-force" algorithms have you seen that happen to do preprocessing and build advanced data structures such as jump tables? :) Btw, as polygenelubricants will probably know, this algorithm is actually used in java.util.regex for string matching. I'm too lazy to check, but it must be interesting to see the implementation, since it is dealing with a huge (unicode) alphabet, so there must be something smart going on.
Dimitris Andreou
I never said it was brute-force. I said it was simple enough that it would not hurt to try and implement it. I would not trust a programmer unable to create the jump-table, it was one of those basic examples we learned in my first trimester of Computer Science and it's documented everywhere.
Matthieu M.
A: 

When you find that Y doesn't match B, what are the next two characters you would compare? If you kept repeating these steps, how many comparisons would you make before you covered the whole string? How many comparisons would you make with the "brute-force" approach?

erickson