views:

157

answers:

2

How to find the longest palindrome prefix of a string in O(n)?

+1  A: 

Solution for a more general problem, not prefix but sub-string, in O(n) :

http://www.akalin.cx/2007/11/28/finding-the-longest-palindromic-substring-in-linear-time/

Second result on google for "longest palindrome prefix"....

Or solution using suffix-trees :

http://www.allisons.org/ll/AlgDS/Tree/Suffix/

Loïc Février
Has the function `fastLongestPalindromes(..)` on the link to akalin.cx actually `O(n)` worst case complexity ? I see two nested loops..
Andre Holzner
Haven't read everything but 2 nested loops does not mean quadratic complexity. The continue at the beginning of the function should ensure that the inner loop in only executed in a few cases and that the overall complexity of each execution of that loop is O(n).
Loïc Février
+4  A: 

Use a rolling hash. If a is your string, let ha[x] be the hash of the first x chars in a computed from left to right and let hr[x] be the hash of the first x characters in s computed from right to left. You're interested in the last position i for which hf[i] = hb[i].

Code in C (use two hashes for each direction to avoid false positives):

int match = n - 1;

int ha1 = 0, ha2 = 0, hr1 = 0, hr2 = 0;
int m1 = 1, m2 = 1;
for ( int i = 0; a[i]; ++i )
{
    ha1 = (ha1 + m1*a[i]) % mod1;
    ha2 = (ha2 + m2*a[i]) % mod2;

    hr1 = (a[i] + base1*hr1) % mod1;
    hr2 = (a[i] + base2*hr2) % mod2;

    m1 *= base1, m1 %= mod1;
    m2 *= base2, m2 %= mod2;

    if ( ha1 == hr1 && ha2 == hr2 )
        match = i;
}
IVlad
I think the 'from right to left' part isn't there in the code example... (`i` goes from left to right and you're only accessing `a[i]`.
Andre Holzner
@Andre Holzner - `i` goes only from left to right. At each step `i`, you will have the length of the current longest palindromic prefix stored in `match`. It's only the rolling hashes that go both ways.
IVlad
+1, I didn't know it was called 'rolling hash'. But I even two hashes doesn't guarantee that every positive is 'true' (not to mention, with such unusual coefficients :)). Simply because for some length _n_ there're more n-character strings than different pairs of int numbers _(ha1, ha2)_. And if you do the verify each positive, you get O(n^2) on uniform string ('aaaaaaaa...').
Nikita Rybak
What you can do is store all hashes on the first pass into array and then start from the longest candidate (whole string), reducing it char by char. In this case, if you get verified positive, program is finished: you've found longest palindrome-prefix. And since your two hashes have rare enough false positives, program is still 'kinda' O(n) for reasonable values of _n_.
Nikita Rybak
Ivlad, I don't see how any of `ha1`, `ha2`, `hr1` or `hr2` are calculated from right to left. In each loop iteration, only the next element to the right (`a[i]`) is processed.
Andre Holzner
@Andre Holzner - try building `ha1` and `hr1` step by step. At `i = 0`, we have `ha1 = a[0]` and `hr1 = a[0]`. If the two are equal (obviously they will be), we have a 1 character long palindromic prefix. At `i = 1`, we have `ha1 = a[0] + base1*a[1]` and `hr1 = a[1] + a[0]*base1`. Do you see where this is going and what I meant by `left to right` and `right to left` now? Try building `ha1` and `hr1` for `i = 2` as well. @Nikita Rybak - true, no guarantees.
IVlad
I see... thanks for the explanation !
Andre Holzner