views:

103

answers:

4

Hi all. I have some legacy C code (as a macro) that I am not allowed to change in any way, or replace.

This code (eventually) outputs out a digest (C) string based on the source string, performing an operation on the hash value for each character in the string.

#define DO_HASH(src, dest) { \
    unsigned long hash = 1111; // Seed. You must NOT change this. \
    char c, *srcPtr; \
    int i; \
    unsigned char hashedChar; \
    \
    srcPtr = src; \
    c = *srcPtr++; \
    while ( c) { \
            hash = ((hash << 5) + hash) + c; \
            c = *srcPtr++; \
    } \
    ... // etc.

} // 

Some years back, I had to implement it in PHP, as a function returning a digest string. The PHP function has to reproduce the C results identically.

function php_DO_HASH($srcStr)
{
    $hash = 1111;       // Seed. You must NOT change this.
    $index = 0;
    $c = $srcStr[$index];

    while ($c) {
        $hash = (($hash << 5) + $hash) + ord($c);
        $index++;
        $c = $srcStr[$index];
    }

    ... // etc.
}

This has worked successfully for some years. However, in the last few days my server host upgraded to a new version of CentOS, but says they did not change the version of PHP. Since then, the two codes now generate different output.

Could anyone please advise as to what I'm doing wrong in the PHP version? Thanks.

A: 

I don't know much about PHP, but I seem to recall you can choose whether array indices start at 0 or 1. It might be worthwhile to check this, and whether this default has changed for your implementation.

I believe there's a variable to set to force this to what you want, though.


Also, the while $c looks to be very literally translated from C. Are you sure there's still a null character at the end of the string to terminate the loop?

Carl Smotricz
I'm not sure about the array feature you refer to -- maybe you're thinking of this?: http://www.php.net/manual/en/function.array.php#26226
Frank Farmer
About the null character: no, there isn't, but the undefined value off the end of the array will still be false, so even though the code is *wrong*, it works anyway by accident. The problem is elsewhere. :)
hobbs
+2  A: 

Perhaps they changed to a 64-bit system? You should try bitanding the hash value with 0xffffffff after each round.

hobbs
Thanks to all for your replies. I've tried the code rearrangement as described, but with no joy. Still different results. The weird part is that in the PHP script, I'm getting a negative number when I print out $hash.
SirRatty
What if you run the result through `sprintf("%u", ...)` in the PHP version to force interpretation as unsigned? Or, what are the actual results for a given file? Given a look at the numbers we might be able to pin down the problem better.
hobbs
Thanks again. I've run the hash value through sprintf as you suggest, and I'm now getting a positive number (yay). But the numbers are still different. I'll put together some example inputs/outputs and report back. Again, a big thanks.
SirRatty
+1  A: 

The while-conditions of your C and PHP version differ.
The C version aborts when there is '\0' character (ord('\0')===0, zero-terminated string) while the php version doesn't. On the other hand the php version will stop at a '0' character (ord('0')===48) while the c version doesn't.

edit: There might also be an issue with value ranges and type conversion. There is no unsigned long type in php. But php converts an integer to a float when the result of an addition is bigger than PHP_INT_MAX. e.g.

var_dump(PHP_INT_MAX);
var_dump(PHP_INT_MAX + 1);

prints (on my 32bit machine)

int(2147483647)
float(2147483648)

I think the next << "fixes" that problem (since php converts the float back to an int in a way that "works" with your algorithm) . But depending on what you're doing with $hash after the loop this could be a problem.

VolkerK
A: 

You are running into the same PHP overflow problem (where the behaviour varies between versions) as this question. The accepted answer there has all the gory details, including this truncate-to-32-bits function which apparently works on all versions of PHP:

function thirtyTwoBitIntval($value)
{
    if ($value < -2147483648)
    {
        return -(-($value) & 0xffffffff);
    }
    elseif ($value > 2147483647)
    {
        return ($value & 0xffffffff);
    }
    return $value;
}

If you pass your hash value through that thirtyTwoBitIntval() function every time it is recalculated, ie:

hash = thirtyTwoBitIntval(($hash << 5) + $hash + ord($c));

then it should fix the problem.

caf
This did the trick! Thank you VERY much, and a big thanks also to everyone who helped with this problem. It was driving me batty. It's great to have such a wonderful community here.
SirRatty