ansaurus

Question

Would there be any advantage in comparing pattern and text characters right-to-left instead of left-to-right?

Answer 1

+9 A:

Yes.

This is not a match, so we know there's no point in trying to match ABC with the first 3 characters. We also know (after linear time pre-processing) that the character we find, 4, doesn't appear in the pattern at all, so the earliest possible match we can find must start at the next position, i.e. 5th character.

Again we try to match backward, so we see if we can match D with the 8th character.

       ? 
12345678
    ABCD

We find 8; this is not a match. Therefore the pattern doesn't appear in the text. We only needed to see 2 characters from the text.

This is one important characteristics of the Boyer-Moore algorithm: for a text of length N and a fixed pattern of length M, the average-case performance is N/M comparison. That is, perhaps somewhat counterintuitively at first, the longer the pattern we are looking for, the faster we can usually find it.

polygenelubricants 2010-06-24 16:40:35

Hey, that's not brute force! :D

Dimitris Andreou 2010-06-24 16:43:56

using @polygenelubricants answer, think of the speed increase when searching/trying to match a domain, email address, phone number, etc

Don 2010-06-24 17:37:40

But KMP has the same efficiency as Boyer–Moore's. And KMP starts from left-to-right. So I think here the discussion is constrained under Brute force as both Dimitris and I mentioned before.

Tianzhou Chen 2010-06-25 09:51:49

+1 for an exceptionally clear description of the principles behind Boyer-Moore.

Nick Johnson 2010-06-25 10:26:11

@Dimitris: but the algorithm is simple and well-documented (simple != obvious ;) ) so I think it qualifies :) Note that you could perfectly do it the other way around. Start from the right and process the pattern left-to-right... it just doesn't matter...

Matthieu M. 2010-06-25 15:15:10

@Matthieu, ok, answer honestly, how many "brute-force" algorithms have you seen that happen to do preprocessing and build advanced data structures such as jump tables? :) Btw, as polygenelubricants will probably know, this algorithm is actually used in java.util.regex for string matching. I'm too lazy to check, but it must be interesting to see the implementation, since it is dealing with a huge (unicode) alphabet, so there must be something smart going on.

Dimitris Andreou 2010-06-25 17:10:42

I never said it was brute-force. I said it was simple enough that it would not hurt to try and implement it. I would not trust a programmer unable to create the jump-table, it was one of those basic examples we learned in my first trimester of Computer Science and it's documented everywhere.

Matthieu M. 2010-06-25 17:25:47

Answer 2

A:

When you find that Y doesn't match B, what are the next two characters you would compare? If you kept repeating these steps, how many comparisons would you make before you covered the whole string? How many comparisons would you make with the "brute-force" approach?

erickson 2010-06-24 17:13:15

ansaurus

tags:

views:

answers:

Would there be any advantage in comparing pattern and text characters right-to-left instead of left-to-right?

See also

related questions