tags:

views:

107

answers:

3

I am using the following hashing function provided in the K&R book.

#define HASHSIZE 101
unsigned hash(char *s)
{
    unsigned hashval;
    for (hashval = 0; *s != '\0'; s++)
        hashval = *s + 31 * hashval;
    return hashval % HASHSIZE;
}

In my project, I have more warnings turned on (warnings are treated as errors too) and the above code will fail to compile.

error: conversion to ‘unsigned int’ from ‘char’ may change the sign of the result

If I make the hashval signed, I am getting negative hash values. I am wondering how this can be fixed.

Any help?

+2  A: 

Change s to be unsigned char * in the function signature, or simply cast when you use it (i.e. (unsigned char *)s).

Oli Charlesworth
That's changing the interface to fix the implementation.
larsmans
True; updating answer accordingly...
Oli Charlesworth
@larsmans: I would agree with you if char were unambiguously signed or unsigned. It's not.
Eric Towers
@Eric Towers, it would be an interface change because an unspecified property of the input would be specified.
larsmans
@larsmans: It's only an interface change if char != unsigned char. Is it? You can't know. (C has three char types. "char" is definitely one of the other two, but which is implementation defined.)
Eric Towers
@Eric Towers: it is an interface change, as `char`, `signed char` and `unsigned char` are three different types. http://stackoverflow.com/questions/436513/charsigned-char-charunsigned-char/436561#436561
larsmans
@larsmans: Actually, no data must be changed. Since hashval is unsigned int, the only arithmetic expression will force promotion of \*s to unsigned int as well. Thus, the output is indistinguishable. The nature of pointers in standard C is such that there is no discernible consequence (not even a warning is required) to calling with *any* kind of char pointer. In short, nothing has changed. If this were a discussion about an OO interface, I'd be agreeing all the way down the line, but this is a discussion about built-in types and they have many subversive and horrific properties.
Eric Towers
I believe we're running around in circles. Let's ask the experts on their opinion: http://stackoverflow.com/questions/4004563/is-the-signedness-of-char-an-interface-issue
larsmans
+1  A: 

I think you can safely typecast your char to unsigned: (unsigned char)*s

mojuba
No, you cannot. This is a lossy operation on non-twos-complement implementations. You must cast the pointer: `*(unsigned char *)s`.
R..
@R: in this particular case, since the value is used for hashing it doesn't matter. Also, aren't all non-twos-complement machines, as well as non-8-bit ones, stuck in the 1970's if not earlier? ;)
mojuba
+4  A: 

What your compiler is picking up on and warning you about is that you are implicitly changing your interpretation of the bytes stored in the area pointed to by s. The function prototype specifies s as being a pointer to a char and by default on your setup, chars seem to be signed. However, to get the has arithmetic correct, you need to use just unsigned values. So the question is this: what should the compiler do with values pointed to through s which actually have negative values?

Let's take a quick diversion to make sure we understand what values we might be considering. The possible values for a signed char are CHAR_MIN to CHAR_MAX inclusive. (These values can be found in limits.h.) The possible values for an unsigned char are 0 to UCHAR_MAX inclusive. So the question becomes this: how do we represent the possible range of values from CHAR_MIN to CHAR_MAX within the range 0 to UCHAR_MAX?

One simple approach is simply to let the compiler carry out this conversion for you: it simply uses wrap-around arithmetic to ensure that the value is within limits: it automatically adds UCHAR_MAX + 1 enough times to get a value which is within the range 0 to UCHAR_MAX. However, the actual value of this will be potentially dependent on the compiler which you are using. It is this possibility of non-portability which lies behind your compiler warning.

OK, so where does that get us? Well, if you are prepared to take responsibility for the hypothetical portability problems which this approach will produce, you can tell the compiler that you are happy for it to make the conversion using the standard rules. You do this by using a cast:

hashval = ((unsigned char) *s) + 31 * hashval;

This approach will suppress the warning and ensure that your arithmetic is all done as unsigned, which is what you want for this sort of has function. However, you need to be aware that the same code on other systems may give different hash results.

An alternative approach is to use the fact that the ANSI C standard specifies that pointers can validly be cast to type unsigned char * to access the underlying byte structure of the data being pointed to. (I don't have my copy of the standard to hand at the moment, or I'd give you a reference.) This would allow you to generalise this approach to producing a function which gives you a hash of a value of any data type. (However, to do this you must think about how you know the size of the data being passed in.) This might look something like:

unsigned hash(void *s, size_t n) {
  unsigned char *t = (unsigned char *) s;

  while (n--)
    hashval = (*(t++) + 31 * hashval) % HASHSIZE;

  return hashval;
}

I hope this gives you a bit of insight into what's going on.

Tim
The second solution in this answer is the only correct one. Casting the `char` to `unsigned char` after reading it will collapse the byte values corresponding to 0 and -0 together on non-twos-complement implementations.
R..
Interesting. Are you saying that you believe there to be standard-conforming implementations where both `-0` and `0` are distinct values for a `char` variable. I'm a little sceptical.
Tim
Awesome explanation. Thank you very much.
Appu