tags:

views:

509

answers:

6

From The C Programming Language (Brian W. Kernighan), 2.7 TYPE CONVERSIONS, pg 43 :

"There is one subtle point about the conversion of characters to integers. ... On some macines a char whose leftmost bit is 1 will be converted to a negative integer. On others, ... is always positive. For portability, specify signed or unsigned if non-character data is to be stored in char variables."

My questions are:

  1. Why would anyone want to store non-char data in char? (an example where this is necessary will be real nice)

  2. Why does integer value of char change when it is converted to int?

  3. Can you elaborate more on this portability issue?

+5  A: 

1) char is the size of a single byte in C, and is therefore used for storing any sort of data. For example, when loading an image into memory, the data is represented as an array of char. In modern code, typedefs such as uint8_t are used to indicate the purpose of a buffer more usefully than just char.

2 & 3) Whether or not char is signed or unsigned is platform dependent, so if a program depends on this behavior then it's best to specify one or the other explicitly.

John Millikin
And because of 2. you usually should not use an array of char in situation 1., but an array of unsigned chars
nos
+2  A: 
  1. The char type is defined to hold one byte, i.e. sizeof(char) is defined to be 1. This is useful for serializing data, for instance.

  2. char is implementation-defined as either unsigned char or signed char. Now imagine that char means smallint. You are simply converting a small integer to a larger integer when you go from smallint to int. The problem is, you don't know whether that smallint is signed or unsigned.

  3. I would say it's not really a portability issue as long as you follow The Bible (K&R).

Andrew Keeton
A: 

unsigned char is often used to process binary data one byte at a time. A common example is UTF-8 strings, which are not strictly made up of "chars."

If a signed char is 8 bits and the top bit is set, that indicates that it's negative. When this is converted to a larger type, the sign is kept by extending the high bit to the high bit of the new type. This is called a "sign-extended" assignment.

Tim Sylvester
A: 

1) Char is implemented as one byte across all systems so it is consistent.

2) The bit mentioned in you question is the one that is used in single byte integers for their singed-ness. When a int on a system is larger than one byte the signed flat is not affected when you convert char to int, other wise it is. ( there are also singed and unsigned chars)

3) Because of the consistence of the char implementation lots of libs use them like the Intel IPP (Intel Performance Primitives) libs and their cousins OpenCV.

JustSmith
A: 

Usually, in C, char to int conversion and vice versa is an issue because the stanard APIs for reading character input/writing character output use int's for the character arguments and return values. See getchar(), getc() and putchar() for example.

Also, since the size of a char is 1 byte, it is a convenient way to deal with arbitrary data as a byte stream.

Jeff Leonard
+6  A: 

In regards to 1)

People often use char arrays when they really want a byte buffer for a data stream. Its not great practice, but plenty of projects do it, and if you're careful, no real harm is done. There are probably other times as well.

In regards to 2)

Signed integers are often sign extended when they are moved from a smaller data type. Thus 11111111b (-1 in base 10) becomes 11111111 11111111 11111111 11111111 when expanded to 32 bits. However, if the char was intended to be unsigned +255, then the signed integer may end up being -1.

About portability 3)

Some machines regard chars as signed integers, while others interpret them as unsigned. It could also vary based on compiler implementation. Most of the time you don't have to worry about it. Kernighan is just trying to help you understand the details.


Edit

I know this is a dead issue, but you can use the following code to check if char's on your system are signed or unsigned:

#include <limits.h> //Include implementation specific constants (MAX_INT, et c.)
#if CHAR_MAX == SCHAR_MAX 
// Plain "char" is signed
#else
// Plain "char" is unsigned
#endif
Andres