views:

75

answers:

1

I've got a question on failure function description from "Compilers: Principles, Techniques, and Tools" aka DragonBook

Firstly, the quote:

In order to process text strings rapidly and search those strings for a keyword, it is useful to define, for keyword b1b2...bn, and position s in that keyword , a failure function, f (s) ... The objective is that b1b2.. - bf(s) is the longest proper prefix of b1...bs, that is also a suffix of b1...bs. The reason f (s) is important is that if we are trying to match a text string for blb2..bn, and we have matched the first s positions, but we then fail (i.e., the next position of the text string does not hold bs+l), then f (s) is the longest prefix of b1..bn that could possibly match the text string up to the point we are at. Of course, the next character of the text string must be bf(s)+1 or else we still have problems and must consider a yet shorter prefix, which will be bf(f(s)).

So, the questions:
1. If we've matched s positions with the text, why f (s) is the longest prefix of b1..bn that matches the string? I think s - is the longest prefix.
2. Next character of the text string must be bf(s)+1, why? We have a mismatch at this position, does it matter at all what the char is?

+2  A: 

f(s) is the longest prefix at that position that might match the entire keyword. The idea is not to try to match the keyword with the text from the start, but to find a position where the keyword appears.

Consider a search for the word 'aaaba' in the text 'aaaabaa'. The match fails after the three first a's, but it's not necessary to retry from the second 'a', since we know that if the next letter is a 'b' (which it is), we may have a match there.

TrayMan