views:

386

answers:

3

Hey people!

I was wondering if anyone can share a source for Rabin-Karp algorithm?

Thanks

+2  A: 

http://en.wikipedia.org/wiki/Rabin-Karp%5Fstring%5Fsearch%5Falgorithm

http://www.eecs.harvard.edu/~ellard/Q-97/HTML/root/node43.html

Here's a couple sources.

BobbyShaftoe
@Gabe: yep, wikipedia already has a psuedocode source. and if you want to use php as your tag suggests, it is powerful enough to pretty much translate that algorithm line by line.
Charles Ma
A: 

Try this out. You will have to strip the punctuation from the $needle and $haystack before sending it to match_rabinKarp(), but this basically follows the algorithm given on the wikipedia page.

// this hash function is friendly, according to the wikipedia page
function hash($string) {
 $base = ord('a');
 if (strlen($string) == 1) {
  return ord($string);
 } else {
  $result = 0;
  // sum each of the character*(base^i)
  for ($i=strlen($string)-1; $i>=0; $i++) {
   $result += $string[$i]*pow($base,$i);
  }
  return $result;
 }
}
// perform the actual match
function match_rabinKarp($needle, $haystack) {
 $needle = substr($needle);      // remove capitals
 $haystack = substr($haystack);  // remove capitals
 $m = strlen($needle);           // length of $needle
 $n = strlen($haystack);         // length of $haystack
 $h_haystack = hash($haystack);  // hash of $haystack
 $h_needle = hash($needle);      // hash of $needle
 // whittle away at the $haystack until we find a match
 for ($i=0;$i<$n-$m+1;$i++) {
  if ($h_needle == $h_haystack) {
   if (substr($haystack,$i,$i+$m-1) == $needle) {
    return $i;
   }
  }
 }
 return false;
}
Tony G.
A: 

This is a port of this C implementation of the Karp-Rabin algorithm:

function KR($haystack, $needle) {
    $n = strlen($haystack);
    $m = strlen($needle);
    if ($m > $n) {
        return -1;
    }
    /* Preprocessing */
    $d = 1 << ($m - 1);
    for ($hh = $hn = $i = 0; $i < $m; ++$i) {
        $hh = (($hh<<1) + ord($haystack[$i]));
        $hn = (($hn<<1) + ord($needle[$i]));
    }
    /* Searching */
    $j = 0;
    while ($j <= $n-$m) {
        if ($hh == $hn && substr($haystack, $j, $m) === $needle) {
            return $j;
        }
        if ($j === $n-$m) {
            return false;
        }
        /* Rehashing */
        $hh = (($hh - ord($haystack[$j]) * $d) << 1) + ord($haystack[$j + $m]);
        ++$j;
    }
    return false;
}
Gumbo