ansaurus

Question

Answer 1

+2 A:

Change s to be unsigned char * in the function signature, or simply cast when you use it (i.e. (unsigned char *)s).

Oli Charlesworth 2010-10-23 09:42:03

That's changing the interface to fix the implementation.

larsmans 2010-10-23 09:50:46

True; updating answer accordingly...

Oli Charlesworth 2010-10-23 09:51:34

@larsmans: I would agree with you if char were unambiguously signed or unsigned. It's not.

Eric Towers 2010-10-23 10:42:38

@Eric Towers, it would be an interface change because an unspecified property of the input would be specified.

larsmans 2010-10-23 11:27:21

@larsmans: It's only an interface change if char != unsigned char. Is it? You can't know. (C has three char types. "char" is definitely one of the other two, but which is implementation defined.)

Eric Towers 2010-10-23 12:09:53

@Eric Towers: it is an interface change, as `char`, `signed char` and `unsigned char` are three different types. http://stackoverflow.com/questions/436513/charsigned-char-charunsigned-char/436561#436561

larsmans 2010-10-23 12:49:12

@larsmans: Actually, no data must be changed. Since hashval is unsigned int, the only arithmetic expression will force promotion of \*s to unsigned int as well. Thus, the output is indistinguishable. The nature of pointers in standard C is such that there is no discernible consequence (not even a warning is required) to calling with *any* kind of char pointer. In short, nothing has changed. If this were a discussion about an OO interface, I'd be agreeing all the way down the line, but this is a discussion about built-in types and they have many subversive and horrific properties.

Eric Towers 2010-10-23 12:59:36

I believe we're running around in circles. Let's ask the experts on their opinion: http://stackoverflow.com/questions/4004563/is-the-signedness-of-char-an-interface-issue

larsmans 2010-10-23 14:54:29

Answer 2

+1 A:

I think you can safely typecast your char to unsigned: (unsigned char)*s

mojuba 2010-10-23 09:43:22

No, you cannot. This is a lossy operation on non-twos-complement implementations. You must cast the pointer: `*(unsigned char *)s`.

R.. 2010-10-23 19:41:34

@R: in this particular case, since the value is used for hashing it doesn't matter. Also, aren't all non-twos-complement machines, as well as non-8-bit ones, stuck in the 1970's if not earlier? ;)

mojuba 2010-10-23 19:48:02

Answer 3

+4 A:

What your compiler is picking up on and warning you about is that you are implicitly changing your interpretation of the bytes stored in the area pointed to by s. The function prototype specifies s as being a pointer to a char and by default on your setup, chars seem to be signed. However, to get the has arithmetic correct, you need to use just unsigned values. So the question is this: what should the compiler do with values pointed to through s which actually have negative values?

Let's take a quick diversion to make sure we understand what values we might be considering. The possible values for a signed char are CHAR_MIN to CHAR_MAX inclusive. (These values can be found in limits.h.) The possible values for an unsigned char are 0 to UCHAR_MAX inclusive. So the question becomes this: how do we represent the possible range of values from CHAR_MIN to CHAR_MAX within the range 0 to UCHAR_MAX?

One simple approach is simply to let the compiler carry out this conversion for you: it simply uses wrap-around arithmetic to ensure that the value is within limits: it automatically adds UCHAR_MAX + 1 enough times to get a value which is within the range 0 to UCHAR_MAX. However, the actual value of this will be potentially dependent on the compiler which you are using. It is this possibility of non-portability which lies behind your compiler warning.

OK, so where does that get us? Well, if you are prepared to take responsibility for the hypothetical portability problems which this approach will produce, you can tell the compiler that you are happy for it to make the conversion using the standard rules. You do this by using a cast:

hashval = ((unsigned char) *s) + 31 * hashval;

This approach will suppress the warning and ensure that your arithmetic is all done as unsigned, which is what you want for this sort of has function. However, you need to be aware that the same code on other systems may give different hash results.

An alternative approach is to use the fact that the ANSI C standard specifies that pointers can validly be cast to type unsigned char * to access the underlying byte structure of the data being pointed to. (I don't have my copy of the standard to hand at the moment, or I'd give you a reference.) This would allow you to generalise this approach to producing a function which gives you a hash of a value of any data type. (However, to do this you must think about how you know the size of the data being passed in.) This might look something like:

unsigned hash(void *s, size_t n) {
  unsigned char *t = (unsigned char *) s;

  while (n--)
    hashval = (*(t++) + 31 * hashval) % HASHSIZE;

  return hashval;
}

I hope this gives you a bit of insight into what's going on.

Tim 2010-10-23 10:25:50

The second solution in this answer is the only correct one. Casting the `char` to `unsigned char` after reading it will collapse the byte values corresponding to 0 and -0 together on non-twos-complement implementations.

R.. 2010-10-23 19:43:21

Interesting. Are you saying that you believe there to be standard-conforming implementations where both `-0` and `0` are distinct values for a `char` variable. I'm a little sceptical.

Tim 2010-10-23 20:01:52

Awesome explanation. Thank you very much.

Appu 2010-10-24 07:31:16

ansaurus

tags:

views:

answers:

Problem with hashing function - C

related questions