tags:

views:

106

answers:

2

I tried to use several implementations, but all of them had bugs.
Search at SO gave me http://www-igm.univ-mlv.fr/~lecroq/string/node14.html - looks nice, but this implementation gave me wrong results - sometimes it doesn't found a string.
I spent couple hours to find the bug.

Following line looks fine:

j += MAX(bmGs[i], bmBc[y[i + j]] - m + 1 + i);

but y is char * and char is signed! It means that y[i + j] could be negative (what happens in one of my tests).

My question is: Where to find correct implementation of Boyer Moore algorithm?

+2  A: 

Have you tried Wikipedia?

Or the PDF written by the inventors of the algorithm?

In silico
+3  A: 

char isn't definitively signed or unsigned - it's unspecified, and left up to the implementation to define.

If the algorithm depends on char being unsigned, then it should explicitly cast the input pointers to unsigned char (which is how the C standard library string handling functions are defined to work - all comparisons are done treating the characters in the string as unsigned char).

caf