views:

801

answers:

8

When using malloc and doing similar memory manipulation can I rely on sizeof( char ) being always 1?

For example I need to allocate memory for N elements of type char. Is multiplying by sizeof( char ) necessary:

char* buffer = malloc( N * sizeof( char ) );

or can I rely on sizeof( char ) always being 1 and just skip the multiplication

char* buffer = malloc( N );

I understand completely that sizeof is evaluated during compilation and then the compiler might even compile out the multiplication and so the performance penalty will be minimal and most likely zero.

I'm asking mainly about code clarity and portability. Is this multiplication ever necessary for char type?

+5  A: 

It is not necessary. See here (for example).

sizeof(char) is defined by the C standard to always be 1 (byte). Note that because sizeof returns a number of bytes the number of bits per byte is irrelevant (and in practical terms is 8 anyway).

Matthew Murdoch
+11  A: 

sizeof(char) is always 1 no matter what type of memory manipulation you do.

However, sizeof(TCHAR) may vary depending upon your compiler options.

Aamir
I'm not an expert, but doesn't the size of a character become larger in unicode situations?
Shadow
@Shadow, no. The wide character type wchar_t is generally used instead of char in that case. The Microsoft-specific TCHAR business is a way to write code that can be compiled for either wide or narrow characters. It isn't clear whether that was a good idea or not.
RBerteig
+16  A: 

By definition, sizeof(char) is always equal to 1. One byte is the size of character in C, whatever the numbers of bits in a byte there is (8 on common desktop CPU).

The typical example where one byte is not 8 bits is the PDP-10 and other old, mini-computer-like architectures with 9/36 bits bytes. But bytes which are not 2^N are becoming extremely uncommon I believe

Also, I think this is better style:

char* buf1;
double* buf2;

buf1 = malloc(sizeof(*buf1) * N);
buf2 = malloc(sizeof(*buf2) * N);

because it works whatever the pointer type is.

David Cournapeau
I thought that the definition of 1Byte = 8 bits. Do you have an example where this does not apply?
AlexDrenea
The definition of 1 byte is N bits, where N is machine dependant. Not all machines have 8 bits/byte (although there aren't that many these days that don't)
1800 INFORMATION
@AlexDrenea: Today, you will typically encounter only 8 bit bytes. But the definition of a byte varies and is not tied to today's architectures, because there were systems with 9 bit bytes and even 36 bit bytes. If you want to be sure, use the ISO term "octet" instead of byte.
OregonGhost
+1 for mentioning the PDP-10
RBerteig
+7  A: 

I consider it kind of an anti-pattern. It signals that the programmer didn't quite know what he/she was doing, which immediately casts the rest of the code in dubious light.

Granted, it's not (quoting Wikipedia) "ineffective", but I do find it "far from optimal". It doesn't cost anything at run-time, but it clutters the code with needless junk, all the while signalling that someone thought it necessary.

Also, please note that the expression doesn't parse as a function-call: sizeof is not a function. You're not calling a function passing it the magical symbol char. You're applying the built-in unary prefix operator sizeof to an expression, and your expression is in this case a cast to the type char, which in C is written as (char).

It's perfectly possible, and highly recommended whenever possible, to use sizeof on other expressions, it will then yield the size of the expression's value:

char a;
printf("A char's size is %u\n", (unsigned int) sizeof a);

This will print 1, always, on all conforming C implementations.

I also heavily agree with David Cournapeau and consider repeating the type name in a malloc()-call to also be kind of an anti-pattern.

Instead of

char *str;

str = malloc(N * sizeof (char));

that many would write to allocate an N-character-capacity string buffer, I'd go with

char *str;

str = malloc(N * sizeof *str);

Or (for strings only) omit the sizeof as per above, but this of course is more general and works just as well for any type of pointer.

unwind
I disagree. If you omit it you (and anyone who reads your code) must remember that this is a special case and recognize it as such. That increases the cognitive burden. Sometimes more code is better.
Michael Carman
Yes, sizeof is not a function - but to me it reads easier if you treat it like one. Unless you know of a case where the extra parentheses change the output?
Mark Ransom
@Michael Carman - It usually _is_ a special case, because you're often allocating and working with strings, whereas if you make an array of ints it could be for any purpose. We need to treat strings differently than arbitrary-typed arrays, and I find the lack of `sizeof(type)` in a `malloc()` to be a nice reminder of this.
Chris Lutz
A: 

Using the sizeof(char) makes your code more readable and portable.

On x86, we all know that a character is 1 byte. But explicitly writing it down helps make your intentions clearer, which is always a good thing.

Also, what if your code gets put on some other platform where a character isn't 1 byte. What if a character was only 4 bits instead?

Agreed, it's not necessary, but it doesn't slow your run time down and it will pay off in that rare case you need to port your code to a different architecture.

samoz
That what I was asking about. Officially char is the smallest addressable chunk of memory which is not gauranteed to be 8 bits. The question is about whether the malloc and all other similar stuff works in term of chars, not 8-bit bytes.
sharptooth
Ahh ok, then yes, malloc works in terms of characters, not bytes.malloc(1) will return 1 character size block of memory.
samoz
-1 your answer is factually incorrect. `sizeof(char)` is _always_ 1. If a `char` is only 4 bits, then 4 bits is 1 byte on that platform, but `sizeof(char)` is defined to be 1 (byte), no matter how many bits it is. The issue you discuss is addressed by the `CHAR_BITS` macro.
Chris Lutz
+2  A: 

Something else to keep in mind is that the compiler statically knows the value of sizeof (char) is 1 and it also knows that multiplying a number by a static 1 implies the multiplication doesn't need to be done; the compiler will optimize it out. Concerns of performance shouldn't enter in to consideration on these grounds.

+2  A: 

While its not necessary, I consider it good practice to leave in the sizeof( char ) because it makes the code more readable and avoids the use of a magic number. Also, if the code needs to be changed later so that instead of a char it's mallocing the size of something into a pointer for that object, it's easier to change the code than if you have just a "1".

indyK1ng
This "ease of changing the code" argument is bull. `sizeof()` is 8 characters. Having to add it because someone didn't write `sizeof(char)` and then the type changed to `wchar_t` won't give anyone carpal tunnel, and if you're concerned this you should be using `sizeof *buf` anyway because it saves you even _less_ typing.
Chris Lutz
+2  A: 

From "New C standard. An economic and cultural commentary".

  1. Statistics: 2.0% of sizeof are taken from char and 1.5% - from unsigned char. Page 1033 in 1.2 version of book.
  2. page 1037.

The number of bits in the representation of a character type is irrelevant. By definition the number of bytes in byte a character type is one.

Coding Guidelines Developers sometimes associate a byte as always containing eight bits. On hosts where the character type is 16 bits, this can lead to the incorrect assumption that applying sizeof to a character type will return the value 2. These issues are discussed elsewhere.

osgx