tags:

views:

1546

answers:

13

Should a buffer of bytes be signed char or unsigned char or simply a char buffer? Any differences between C and C++?

Thanks.

A: 

If you fetch an element into a wider variable, it will of course be sign-extended or not.

pngaz
A: 

Should and should ... I tend to prefer unsigned, since it feels more "raw", less inviting to say "hey, that's just a bunch of small ints", if I want to emphasize the binary-ness of the data.

I don't think I've ever used an explicit signed char to represent a buffer of bytes.

Of course, one third option is to represent the buffer as void * as much as possible. Many common I/O functions work with void *, so sometimes the decision of what integer type to use can be fully encapsulated, which is nice.

unwind
+5  A: 

It is better to define it as unsigned char. Infact Win32 type BYTE is defined as unsigned char. There is no difference between C & C++ between this.

Naveen
A: 

Several years ago I had a problem with a C++ console application that printed colored chars for ASCII values above 128 and this was solved by switching from char to unsigned char, but I think it had been solveable while keeping char type, too.

For now, most C/C++ functions use char and I understand both languages much better now, so I use char in most cases.

schnaader
+11  A: 

It depends.

If the buffer is intended to hold text, then it probably makes sense to declare it as an array of char and let the platform decide for you whether that is signed or unsigned by default. That will give you the least trouble passing the data in and out of the implementation's runtime library, for example.

If the buffer is intended to hold binary data, then it depends on how you intend to use it. For example, if the binary data is really a packed array of data samples that are signed 8-bit fixed point ADC measurements, then signed char would be best.

In most real-world cases, the buffer is just that, a buffer, and you don't really care about the types of the individual bytes because you filled the buffer in a bulk operation, and you are about to pass it off to a parser to interpret the complex data structure and do something useful. In that case, declare it in the simplest way.

RBerteig
+3  A: 

Do you really care? If you don't, just use the default (char) and don't clutter your code with unimportant matter. Otherwise, future maintainers will be left wondering why did you use signed (or unsigned). Make their life simpler.

Gorpik
I don't agree. If I encounter an array of (signed) chars, I might be inclined to think that it somehow holds textual data.
Dave Van den Eynde
Agree with Dave VdE
dcw
And why can't unsigned char array hold textual data? Plain char default signedness differs between architectures, but libc signatures of string functions are still the same.
Alex B
It does make a difference according to the standard.
Richard Corden
I disagree as well. If I see an array of chars, I assume it is character data. If I see unsigned chars, I assume it is binary (byte) data.
jalf
Generally you go for unsigned to say "hey, it's just data"
Edouard A.
Technically an array of uint8_t or int8_t will not change the actual data in the array and therefore they are the same functionally.Personally, from a coding style point-of-view, I think it is better to use uint8_t because it implies an array of data.
Trevor Boyd Smith
+5  A: 

If it actually is a buffer of 8 bit bytes, rather than a string in the machine's default locale, then I'd use uint8_t. Not that there are many machines around where a char is not a byte (or a byte a octet), but making the statement 'this is a buffer of octets' rather than 'this is a string' is often useful documentation.

Pete Kirkham
I've been through this, and it sounds nice in theory, but it creates a lot of trouble if you pass this data to standard C or POSIX functions (file/socket read/writes).
Alex B
POSIX read/write take a void* buffer. The POSIX functions which expect a char* (eg the path argument to open() ) expect a string, not a byte buffer.
Pete Kirkham
+2  A: 

For maximum portability always use unsigned char. There are a couple of instances where this could come into play. Serialized data shared across systems with different endian type immediately comes to mind. When performing shift or bit masking the values is another.

MrEvil
+2  A: 

You should use either char or unsigned char but never signed char. The standard has the following in 3.9/2

For any object (other than a base-class subobject) of POD type T, whether or not the object holds a valid value of type T, the underlying bytes (1.7) making up the object can be copied into an array of char or unsigned char.If the content of the array of char or unsigned char is copied back into the object, the object shall subsequently hold its original value.

Richard Corden
+13  A: 

If you intend to store arbitrary binary data, you should use unsigned char. It is the only data type that is guaranteed to have no padding bits by the C Standard. Each other data type may contain padding bits in its object representation (that is the one that contains all bits of an object, instead of only those that determines a value). The padding bits' state is unspecified and are not used to store values. So if you read using char some binary data, things would be cut down to the value range of a char (by interpreting only the value bits), but there may still be bits that are just ignored but still are there and read by memcpy. Much like padding bits in real struct objects. Type unsigned char is guaranteed to not contain those. That follows from 5.2.4.2.1/2 (C99 TC2, n1124 here):

If the value of an object of type char is treated as a signed integer when used in an expression, the value of CHAR_MIN shall be the same as that of SCHAR_MIN and the value of CHAR_MAX shall be the same as that of SCHAR_MAX. Otherwise, the value of CHAR_MIN shall be 0 and the value of CHAR_MAX shall be the same as that of UCHAR_MAX. The value UCHAR_MAX shall equal 2^CHAR_BIT − 1

From the last sentence it follows that there is no space left for any padding bits. If you use char as the type of your buffer, you also have the problem of overflows: Assigning any value explicitly to one such element which is in the range of 8 bits - so you may expect such assignment to be OK - but not within the range of a char, which is CHAR_MIN..CHAR_MAX, such a conversion overflows and causes implementation defined results, including raise of signals.

Even if any problems regarding the above would probably not show in real implementations (would be a very poor quality of implementation), you are best to use the right type from the beginning onwards, which is unsigned char.

For strings, however, the data type of choice is char, which will be understood by string and print functions. Using signed char for these purposes looks like a wrong decision to me.

For further information, read this proposal which contain a fix for a next version of the C Standard which eventually will require signed char not have any padding bits either. It's already incorporated into the working paper.

Johannes Schaub - litb
+2  A: 

The choice of int8_t vs uint8_t is similar to when you are comparing a ptr to be NULL.


From a functionality point of view, comparing to NULL is the same as comparing to 0 because NULL is a #define for 0.

But personally, from a coding style point of view, I choose to compare my pointers to NULL because the NULL #define connotes to the person maintaining the code that you are checking for a bad pointer...

VS

when someone sees a comparison to 0 it connotes that you are checking for a specific value.


For the above reason, I would use uint8_t.

Trevor Boyd Smith
A: 

If you lie to the compiler, it will punish you.

If the buffer contains data that is just passing through, and you will not manipulate them in any way, it doesn't matter.

However, if you have to operate on the buffer contents then the correct type declaration will make your code simpler. No "int val = buf[i] & 0xff;" nonsense.

So, think about what the data actually is and how you need to use it.

Darron
A: 
typedef char byte;

Now you can make your array be of bytes. It's obvious to everyone what you meant, and you don't lose any functionality.

I know it's somewhat silly, but it makes your code read 100% as you intended.

Matt Cruikshank