ansaurus

Question

Can someone explain how the signedness of char is platform specific?

Answer 1

+3 A:

It's more correct to say that it's compiler-specific and you should not count on char being signed or unsigned when using char without a signed or unsigned qualifier.

Otherwise you would face the following problem: you write and debug the program assuming that char is signed by default and then it is recompiled with a compiler assuming otherwise and the program behaviour changes drastically. If you rely on this assumption only once in a while in your code you risk facing unintended behaviour in some cases which are only triggered in your program under specific conditions and are very hard to detect and debug.

sharptooth 2009-07-31 11:17:04

Here's an example of surprising behaviour:http://stackoverflow.com/questions/1097130/in-c-left-shift-char-0xff-by-8-and-cast-it-to-int/

sharptooth 2009-07-31 11:31:03

A classic problem occurs with Turkish y-umlaut (ÿ, Unicode U+00FF) in ISO 8859-1, character code 255. If char is signed, it can be confused with EOF, which is normally -1.

Jonathan Leffler 2009-07-31 12:06:51

Answer 2

A:

Having a signed char is more of a fluke of how all base variable types are handled in C, generally it is not actually useful to have negative characters.

ewanm89 2009-07-31 11:19:14

Many people would say 'generally it is not useful to have unsigned chars' That is why the signedness of char differs between implementations.

William Pursell 2009-07-31 11:20:50

This is what i don't understand, surely undersigned chars are more useful then signed?

Adam Naylor 2009-07-31 11:23:26

And you actually ever assigned a negative value to a character, wide character support and the like is more important now than characters with negative values.

ewanm89 2009-07-31 11:25:11

updated my question

Adam Naylor 2009-07-31 11:26:27

@Adam, it doesn't matter when one doesn't have enough characters to fill every bit of a byte in ANSI C/ISO C++ (ASCII character set), hence the sign bit is more there for good measure.

ewanm89 2009-07-31 11:26:34

@William Pursell: I've never felt that 'char' being signed was useful for, whereas having them unsigned makes a lot of character (text) processing simpler.

Jonathan Leffler 2009-07-31 12:10:05

Answer 3

+1 A:

Perhaps you are referring to the fact that the signedness of char is compiler / platform specific. Here is a blog entry that sheds some light on it:

Character types in C and C++

Karl Voigtland 2009-07-31 11:23:34

I think this is what i read actually!I've added to my question

Adam Naylor 2009-07-31 11:25:22

Answer 4

A:

a signed char is always 8 bit and has always the signed bit as the last bit.

an unsigned char is always 8 bit and doesn't have a sign bit.

a char is as far as I know always unsigned. Any compiler defaulting to a signed char will face a lot of incompatible programs.

Toad 2009-07-31 11:24:00

A char is not always 8 bits. Historically, it was often 9. Currently, it is often 16 or 32. The number of bits in a char is CHAR_BIT, which is implementation dependent.

William Pursell 2009-07-31 11:26:22

I dont' beliece this is correct... http://www.parashift.com/c++-faq-lite/intrinsic-types.html#faq-26.4Clearly states that char == 1 byte and 1 byte == AT LEAST 8 bit's?

Adam Naylor 2009-07-31 11:28:18

gcc, msvc in default recognizes char as signed char.

Yossarian 2009-07-31 11:41:47

Yossarian, for GCC, the default depends on what platform it's running on.

Rob Kennedy 2009-07-31 17:30:46

Answer 5

+13 A:

You misunderstood something. signed char is always signed. unsigned char is always unsigned. But whether plain char is signed or unsigned is implementation specific - that means it depends on your compiler. This makes difference from int types, which all are signed (int is the same as signed int, short is the same as signed short). More interesting thing is that char, signed char and unsigned char are treated as three distinct types in terms of function overloading. It means that you can have in the same compilation unit three function overloads:

void overload(char);
void overload(signed char);
void overload(unsigned char);

For int types is contrary, you can't have

void overload(int);
void overload(signed int);

because int and signed int is the same.

Tadeusz Kopec 2009-07-31 11:30:33

I think this clarifies things greatly but i'd like some more feedback before i accept the answer

Adam Naylor 2009-07-31 11:32:51

Re 'int is the same as signed int' etc.: Unless you use it as the type of a bitfield!

Richard Corden 2009-07-31 12:05:09

+1 very good answer and learnt a lot from it. Wont char take one of signed char or unsigned char on any one platform? In which case, how can the overload work?

MeThinks 2009-07-31 12:24:25

should have been "how does the overload work?". But just read tkopec's answer again and it is clearly mentioned they are treated as distinct types. My bad

MeThinks 2009-07-31 12:28:47

Answer 6

+4 A:

Let's assume that your platform has eight-bit bytes, and suppose we have the bit pattern 10101010. To a signed char, that value is -86. For unsigned char, though, that same bit pattern represents 170. We haven't moved any bits around; it's the same bits, interpreted two different ways.

Now for char. The standard doesn't say which of those two interpretations should be correct. A char holding the bit pattern 10101010 could be either -86 or 170. It's going to be one of those two values, but you have to know the compiler and the platform before you can predict which it will be. Some compilers offer a command-line switch to control which one it will be. Some compilers have different defaults depending on what OS they're running on, so they can match the OS convention.

In most code, it really shouldn't matter. They are treated as three distinct types, for the purposes of overloading. Pointers to one of those types aren't compatible with pointers to another type. Type calling strlen with a signed char* or an unsigned char*; it won't work.

Use signed char when you want a one-byte signed numeric type, and use unsigned char when you want an one-byte unsigned numeric type. Use plain old char when you want to hold characters. That's what the programmer was thinking when writing the typedef you're asking about. The name "byte" doesn't have the connotation of holding character data, whereas the name "unsigned char" has the word "char" in its name, and that causes some people to think it's a good type for holding characters, or that it's a good idea to compare it with variables of type char.

Since you're unlikely to do general arithmetic on characters, it won't matter whether char is signed or unsigned on any of the platforms and compilers you use.

Rob Kennedy 2009-07-31 14:44:56

That's the explaination i was looking for! +1 and a big thank you!

Adam Naylor 2009-07-31 15:45:21

ansaurus

tags:

views:

answers:

Can someone explain how the signedness of char is platform specific?

related questions