tags:

views:

303

answers:

6

I recently read that the differences between

char
unsigned char

and

signed char

is platform specific.
I can't quite get my head round this? does it mean the the bit sequence can vary from one platform to the next ie platform1 the sign is the first bit, platform2 the sign could be at the end? how would you code against this?

Basically my question comes from seeing this line:

typedef unsigned char byte;

I dont understand the relevance of the signage?

+3  A: 

It's more correct to say that it's compiler-specific and you should not count on char being signed or unsigned when using char without a signed or unsigned qualifier.

Otherwise you would face the following problem: you write and debug the program assuming that char is signed by default and then it is recompiled with a compiler assuming otherwise and the program behaviour changes drastically. If you rely on this assumption only once in a while in your code you risk facing unintended behaviour in some cases which are only triggered in your program under specific conditions and are very hard to detect and debug.

sharptooth
Here's an example of surprising behaviour:http://stackoverflow.com/questions/1097130/in-c-left-shift-char-0xff-by-8-and-cast-it-to-int/
sharptooth
A classic problem occurs with Turkish y-umlaut (ÿ, Unicode U+00FF) in ISO 8859-1, character code 255. If char is signed, it can be confused with EOF, which is normally -1.
Jonathan Leffler
A: 

Having a signed char is more of a fluke of how all base variable types are handled in C, generally it is not actually useful to have negative characters.

ewanm89
Many people would say 'generally it is not useful to have unsigned chars' That is why the signedness of char differs between implementations.
William Pursell
This is what i don't understand, surely undersigned chars are more useful then signed?
Adam Naylor
And you actually ever assigned a negative value to a character, wide character support and the like is more important now than characters with negative values.
ewanm89
updated my question
Adam Naylor
@Adam, it doesn't matter when one doesn't have enough characters to fill every bit of a byte in ANSI C/ISO C++ (ASCII character set), hence the sign bit is more there for good measure.
ewanm89
@William Pursell: I've never felt that 'char' being signed was useful for, whereas having them unsigned makes a lot of character (text) processing simpler.
Jonathan Leffler
+1  A: 

Perhaps you are referring to the fact that the signedness of char is compiler / platform specific. Here is a blog entry that sheds some light on it:

Character types in C and C++

Karl Voigtland
I think this is what i read actually!I've added to my question
Adam Naylor
A: 

a signed char is always 8 bit and has always the signed bit as the last bit.

an unsigned char is always 8 bit and doesn't have a sign bit.

a char is as far as I know always unsigned. Any compiler defaulting to a signed char will face a lot of incompatible programs.

Toad
A char is not always 8 bits. Historically, it was often 9. Currently, it is often 16 or 32. The number of bits in a char is CHAR_BIT, which is implementation dependent.
William Pursell
I dont' beliece this is correct... http://www.parashift.com/c++-faq-lite/intrinsic-types.html#faq-26.4Clearly states that char == 1 byte and 1 byte == AT LEAST 8 bit's?
Adam Naylor
gcc, msvc in default recognizes char as signed char.
Yossarian
Yossarian, for GCC, the default depends on what platform it's running on.
Rob Kennedy
+13  A: 

You misunderstood something. signed char is always signed. unsigned char is always unsigned. But whether plain char is signed or unsigned is implementation specific - that means it depends on your compiler. This makes difference from int types, which all are signed (int is the same as signed int, short is the same as signed short). More interesting thing is that char, signed char and unsigned char are treated as three distinct types in terms of function overloading. It means that you can have in the same compilation unit three function overloads:

void overload(char);
void overload(signed char);
void overload(unsigned char);

For int types is contrary, you can't have

void overload(int);
void overload(signed int);

because int and signed int is the same.

Tadeusz Kopec
I think this clarifies things greatly but i'd like some more feedback before i accept the answer
Adam Naylor
Re 'int is the same as signed int' etc.: Unless you use it as the type of a bitfield!
Richard Corden
+1 very good answer and learnt a lot from it. Wont char take one of signed char or unsigned char on any one platform? In which case, how can the overload work?
MeThinks
should have been "how does the overload work?". But just read tkopec's answer again and it is clearly mentioned they are treated as distinct types. My bad
MeThinks
+4  A: 

Let's assume that your platform has eight-bit bytes, and suppose we have the bit pattern 10101010. To a signed char, that value is -86. For unsigned char, though, that same bit pattern represents 170. We haven't moved any bits around; it's the same bits, interpreted two different ways.

Now for char. The standard doesn't say which of those two interpretations should be correct. A char holding the bit pattern 10101010 could be either -86 or 170. It's going to be one of those two values, but you have to know the compiler and the platform before you can predict which it will be. Some compilers offer a command-line switch to control which one it will be. Some compilers have different defaults depending on what OS they're running on, so they can match the OS convention.

In most code, it really shouldn't matter. They are treated as three distinct types, for the purposes of overloading. Pointers to one of those types aren't compatible with pointers to another type. Type calling strlen with a signed char* or an unsigned char*; it won't work.

Use signed char when you want a one-byte signed numeric type, and use unsigned char when you want an one-byte unsigned numeric type. Use plain old char when you want to hold characters. That's what the programmer was thinking when writing the typedef you're asking about. The name "byte" doesn't have the connotation of holding character data, whereas the name "unsigned char" has the word "char" in its name, and that causes some people to think it's a good type for holding characters, or that it's a good idea to compare it with variables of type char.

Since you're unlikely to do general arithmetic on characters, it won't matter whether char is signed or unsigned on any of the platforms and compilers you use.

Rob Kennedy
That's the explaination i was looking for! +1 and a big thank you!
Adam Naylor