tags:

views:

1035

answers:

5

Is there a standards-complaint method to represent a byte in ANSI (C89/90) C? I know that, most often, a char happens to be a byte, but my understanding is that this is not guaranteed to be the case. Also, there is stdint.h in the C99 standard, but what was used before C99?

I'm curious about both 8 bits specifically, and a "byte" (sizeof(x) == 1).

+2  A: 

Before C99? Platform-dependent code.

But why do you care? Just use stdint.h.

In every implementation of C I have used (from old UNIX to embedded compilers written by hardware engineers to big-vendor compilers) char has always been 8-bit.

Frank Krueger
So is your advice to use uint8_t or to use unsigned char?
Chris Conway
Funny, when I went to school, a character was 6-bits. Lowercase cost 12-bits! I take it you don't miss the 36-Bit, 60-Bit, and other fun machines we used to work with.
Will Hartung
A: 

You can find pretty reliable macros and typedefs in boost.

PolyThinker
But that'd be C++, right?
Sydius
Well, you could just copy/paste what you need from there. There's nothing special if you only need a reliable type of integers of a certain length.
PolyThinker
+5  A: 

You can always represent a byte (if you mean 8bits) in a unsigned char. It's always at least 8 bits in size, all bits making up the value, so a 8 bit value will always fit into it.

If you want exactly 8 bits, i also think you'll have to use platform dependent ways. POSIX systems seem to be required to support int8_t. That means that on POSIX systems, char (and thus a byte) is always 8 bits.

Johannes Schaub - litb
POSIX support for stdint.h post-dates C99.
Chris Conway
ah yeah. looks like from 2001. but i think even if he hasn't got a c99 compiler shipping it - if he's on a posix machine, he can take advantage of its requirements from stdint.h . if he's on ms windows, all my bets are off :) maybe he can grab stuff out of cstdint.hpp of boost and c'ify them ?
Johannes Schaub - litb
I mean a byte, not necessarily 8 bits, but thanks. As an aside, does the spec say it must be at least 8 bits, or does it just happen to be the case?
Sydius
yes, the c standard documenting limits.h requires UCHAR_MAX be at least 255, have no padding bits and use a pure binary system. char is required to have same range and representation as either unsigned char or signed char but still must be a distinct type.
Johannes Schaub - litb
+9  A: 

char is always a byte , but it's not always an octet. A byte is the smallest addressable unit of memory (in most definitions), an octet is 8-bit unit of memory.

That is, sizeof(char) is always 1 for all implementations, but CHAR_BIT macro in limits.h defines the size of a byte for a platform and it is not always 8 bit. There are platforms with 16-bit and 32-bit bytes, hence char will take up more bits, but it is still a byte. Since required range for char is at least -127 to 127 (or 0 to 255), it will be at least 8 bit on all platforms.

ISO/IEC 9899:TC3

6.5.3.4 The sizeof operator

  1. ...
  2. The sizeof operator yields the size (in bytes) of its operand, which may be an expression or the parenthesized name of a type. [...]
  3. When applied to an operand that has type char, unsigned char, or signed char, (or a qualified version thereof) the result is 1. [...]

Emphasis mine.

Alex B
Just for clarifications, is sizeof(char) always 1 per the spec, or just happens to be in all implementations?
Sydius
Assuming you are using an odd architecture with a <8-bit byte, couldn't char not be a byte (since CHAR_BITS >= 8)? If not, could you precisely define what you mean by "byte" above?
Chris Conway
The required range for char is actually either -127 to 127 (don't forget that some architectures used to use signed magnitude or one's complement integer representations) or 0 to 255, depending on whether char is signed or unsigned. 8-bit two's complement supports -128 to 127, not -127 to 128.
bk1e
@Chris: byte = smallest addressable unit of memory. I am not sure what you mean by your question. less-than-8bit byte means a platform can't be C compliant.
Alex B
@bk1e: indeed. Corrected.
Alex B
Didn't realize C required >=8-bit bytes (indeed, the standard says a byte must hold a char and a char must be 8 bits). We've reached the frontier of C's portability...
Chris Conway
A: 

In ANSI C89/ISO C90 sizeof(char) == 1. However, it is not always the case that 1 byte is 8 bits. If you wish to count the number of bits in 1 byte (and you don't have access to limits.h), I suggest the following:

unsigned int bitnum(void) {
    unsigned char c = ~0u; /* Thank you Jonathan. */
    unsigned int v;

    for(v = 0u; c; ++v)
        c &= c - 1u;
    return(v);
}

Here we use Kernighan's method to count the number of bits set in c. To better understand the code above (or see others like it), I refer you to "Bit Twiddling Hacks".

Anthony Cuozzo
Better to use ~0 than -1; on a one's complement or sign-magnitude machine, -1 might not be all-bits-set. ~0 is guaranteed to be all bits set.
Jonathan Leffler
@Jonathan: That makes sense. Thank you for the suggestion. I am editing the post now. (I'm sorry that I edited this comment so many times!)
Anthony Cuozzo
-1 is always all bits one. the conversion of -1 to unsigned char is not necassarily bit-preserving (truncating)
Johannes Schaub - litb
it's defined mathematically: -N is (2^CHAR_BIT - (N mod (2^CHAR_BIT))) that means, -1 is always the most highest unsigned char, having all bits 1. the difference in sign representation is, that if you have two's complement, the conversion is conceptual there: the bit pattern won't change:
Johannes Schaub - litb
while -1 is all bits 1 before, it's so too after conversion to unsigned char. nitpicking (i really don't like this, but just to be correct :)), ~0u could (after conversion) instead result in a different value than all-bits-1: converting a value to unsigned char will wrap around N => N mod 2^CHAR_BIT
Johannes Schaub - litb
... means that if N is not a multiple of UCHAR_MAX (which can happen, because an unsigned int does not need to use all its bits to store its value), you can be left with a value not necassary all bits 1.so i think your first version converting -1 to unsigned char was alright.plz tell me if i'm wrong
Johannes Schaub - litb
to quote it directly: "Otherwise, if the new type is unsigned, the value is converted by repeatedly adding or subtracting one more than the maximum value that can be represented in the new type until the value is in the range of the new type."
Johannes Schaub - litb
(by saying "-1 is always all bits one" i mean -1 converted to unsigned char, like you had it in your answer. -1 by itself, of course, is only all bits one for two's complement). for two's complement, the conversion doesn't change the bits. comments are too short to really tell the truth :)
Johannes Schaub - litb
I'm fairly certain that (unsigned char)-1 will not set all bits on a machine which uses either a ones' complement or a sign-magnitude representation of signed numbers.
Anthony Cuozzo