tags:

views:

477

answers:

4

I tried

printf("%d, %d\n", sizeof(char), sizeof('a'));

and got 1, 4 as output. If size of a character is one, why does 'c' give me 4? I guess it's because it's an integer. So when I do char ch = 'c'; is there an implicit conversion happening, under the hood, from that 4 byte value to a 1 byte value when it's assigned to the char variable?

A: 

According to the ANSI C standards, a char gets promoted to an int in the context where integers are used, you used a integer format specifier in the printf hence the different values. A char is usually 1 byte but that is implementation defined based on the runtime and compiler.

Hope this helps, Best regards, Tom.

tommieb75
The integer format referred to sizeof('a') not 'a' so I don't see how this argument holds.
Shane MacLaughlin
The C standard says a char literal is of type int - it has sizeof int and no promotion is involved.
anon
Your answer seems to suggest that the C compiler inspects a format string used by a library function when compiling a program, are you sure that that is the case?
Peter van der Heijden
What if it was scanf("%s\n",format) ; printf(format, sizeof(char), sizeof('a')); and you'd type "%d, %d\n" when prompted? In this case the compiler has no way of knowing the variable types a'priori and has to use the ellipsis operator blindly as like it is meant to?
SF.
@Peter van der Heijden : you are correct, a format string and its specifiers have nothing to do with the types of the variables passed after them. `gcc`, will issue warnings if they don't line up, but it compiles with mismatched types just fine, under the assumption you know more than the compiler does. That said, the 'a' is in a sizeof and is not in an "integer context". The sizeof calls are returning size_t, which I believe is generally typedef'ed to an unsigned integer.
Michael Speer
+10  A: 

In C 'a' is an integer constant (!?!), so 4 is correct for your architecture. It is implicitly cast to char for the assignment. sizeof(char) is always 1 by definition. The standard doesn't say what units 1 is, but it is often bytes.

Richard Pennington
+ 1 for "but it is often bytes", I'm still chuckling :)
Binary Worrier
used to be an integer was 2 bytes .. the standard doesn't define that either.
lexu
May I know the rationale behind the standard stating `sizeof(char)` should always be 1? Is it because of the ASCII table having 256 chars? What if in an implementation I need to have more than that, say unicode?
legends2k
sizeof(1) is always 1 because that's what it is. 1 can be one byte (of 8 bits for example) or 3 bytes or ..
Richard Pennington
The standard defines the `sizeof` operator as returning the size in **bytes**, so it is not *often*, but rather always. In the second paragraph of 'The sizeof operator': 'The sizeof operator yields the size (in bytes) of its operand.'
David Rodríguez - dribeas
sizeof(char) is one byte because it is the definition of byte in the C standard. That byte may be 8 bits or more (can't be less in C), and may or may not be the smallest unit adressable by the computer (definition of byte common in computer architecture). A third common definition of byte is "the unit used for a character encoding" -- i.e 8bits for UTF-8 or ISO-8859-X, 16 bits for UTF-16. Quite often, all definitions agree and put the size of the byte to 8 bits. So often that a fourth definition of byte is "8 bits". When they don't agree, you have better to be clear which definition you use
AProgrammer
I always shudder when reading "implicitly cast" in SO posts. There is no implicit cast: A cast is always an explicit conversion. The C Standard says in 6.3: "Several operators convert operand values from one type to another automatically. This subclause specifies the result required from such an *implicit conversion*, as well as those that result from a cast operation (an *explicit conversion*).". You want to say "implicitly converted".
Johannes Schaub - litb
@lexu: An `int` has to be at least 16 bits, whatever that comes to in bytes. Since `sizeof()` measures in 8-bit bytes on most modern computers, that typically means at least 2 bytes. An `int` is supposed to be a natural size, which means 2 bytes on the old 16-bit machines and 4 on the more modern ones.
David Thornley
@litb: Thanks for correcting me; I've fixed it in the question from 'casting' to 'conversion'.
legends2k
sizeof() measures in (integer, I believe) multiples of CHAR_BITS. No more, no less. sizeof(char) == 1, by definition. The number of bits in another type can be foudn by multiplying sizeof(type) with CHAR_BITS. Of course, most (if not all) platforms will have CHAR_BITS being 8.
Vatine
+3  A: 

Th C standard says that a character literal like 'a' is of type int, not type char. It therefore has (on your platform) sizeof == 4. See this question for a fuller discussion.

anon
I asked about the promotion/casting that happens between the two data types, while the discussion/answer doesn't answer this.
legends2k
@legends2K You asked "If size of a character is one, why does 'c' give me 4?" As this answer and the question I linked explain that 'a' has sizeof == 4, there is obviously no casting or promotion taking place.
anon
Well. there is a detailed form of the question, below it, which reads "is there an implicit typecasting happening, under the hood, from that 4 byte value to a 1 byte value when it's assigned to the char variable". This too is part of it, I believe.
legends2k
+2  A: 

It is the normal behavior of the sizeof operator (See Wikipedia):

  • For a datatype, sizeof returns the size of the datatype. For char, you get 1.
  • For an expression, sizeof returns the size of the type of the variable or expression. As a character literal is typed as int, you get 4.
Laurent Etiemble