views:

290

answers:

3

When answering a comment to another answer of mine here, I found what I think may be a hole in the C standard (c1x, I haven't checked the earlier ones and yes, I know it's incredibly unlikely that I alone among all the planet's inhabitants have found a bug in the standard). Information follows:

  1. Section 6.5.3.4 ("The sizeof operator") para 2 states "The sizeof operator yields the size (in bytes) of its operand".
  2. Para 3 of that section states: "When applied to an operand that has type char, unsigned char, or signed char, (or a qualified version thereof) the result is 1".
  3. Section 7.20.3.3 describes void *malloc(size_t sz) but all it says is "The malloc function allocates space for an object whose size is specified by size and whose value is indeterminate". It makes no mention at all what units are used for the argument.
  4. Annex E startes the 8 is the minimum value for CHAR_BIT so chars can be more than one byte in length.

My question is simply this:

In an environment where a char is 16 bits wide, will malloc(10 * sizeof(char)) allocate 10 chars (20 bytes) or 10 bytes? Point 1 above seems to indicate the former, point 2 indicates the latter.

Anyone with more C-standard-fu than me have an answer for this?

+10  A: 

In a 16-bit char environment malloc(10 * sizeof(char)) will allocate 10 chars (10 bytes), because if char is 16 bits, then that architecture/implementation defines a byte as 16 bits. A char isn't an octet, it's a byte. On older computers this can be larger than the 8 bit de-facto standard we have today.

EDIT: Sorry, Pax, but thanks for the standards quote to back me up:

3.6 Terms, definitions and symbols

byte - addressable unit of data storage large enough to hold any member of the basic character set of the execution environment...
NOTE 2 - A byte is composed of a contiguous sequence of bits, the number of which is implementation-defined.

Chris Lutz
Actually, I think you may have it there. based on your answer, I found 3.6 (in "Terms, definitions and symbols") stating "byte - addressable unit of data storage large enough to hold any member of the basic character set of the execution environment ... NOTE 2 - A byte is composed of a contiguous sequence of bits, the number of which is implementation-defined".
paxdiablo
Damn, think of the glory and accolades I would have received for finding a problem in the standard. Oh well, back to the day job :-)
paxdiablo
Once again, I find myself repeating the "I need to get a copy of the standards" mantra. I'll add your standards quote to my answer for completeness.
Chris Lutz
Here's the latest drafthttp://www.open-std.org/jtc1/sc22/wg14/www/docs/n1362.pdfbut it doesn't necessarily make your life *that* much easier.
paxdiablo
And the C++0x draft as well for completeness.http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2009/n2914.pdf
paxdiablo
Note that it's not just "older" computers with CHAR_BIT > 8. Some DSPs have 16, 32, or even 24-bit char.
Steve Jessop
@Chris - links to getting the standard(s): http://stackoverflow.com/questions/81656/where-do-i-find-the-current-x-standard/83763#83763
Michael Burr
Oops, forgot to upvote :-) +1.
paxdiablo
+1  A: 

Aren't the units of "size_t sz" in whatever the addressable unit of your architecture is? I work with a DSP whose addresses correspond to 32-bit values, not bytes. malloc(1) gets me a pointer to a 4-byte area.

twon33
In terms of the wording in the C standard, a "byte" on that architecture is 32 bits. The slightly confusing thing is what the C standard calls a "byte" doesn't necessarily correspond with (today's) common usage of the word.
caf
And for the same reason, when network standards mean "8 bits", they say "octet", not "byte". It's only in retail contexts that a byte is unambiguously 8 bits, as in "MB of bandwidth", "GB of RAM", "TB of disk space".
Steve Jessop
Ah, that makes sense. I guess we're stuck with the ambiguity for non-language-lawyers, since words like "megaoctets" or "kibioctets" don't flow particuarly well.
twon33
+2  A: 
Michael Burr
+1. It's worth mentioning that the standard sets the *lower* bound of CHAR_BITS but not the upper. So it has to be at least 8.
paxdiablo