Could you explain following code?
str = (char *) malloc (sizeof(char) * (num+1));
- What's is malloc?
- Why num + 1?
Could you explain following code?
str = (char *) malloc (sizeof(char) * (num+1));
malloc
is for memory allocation.
num + 1
is to allow for the null-terminator - \0
.
malloc
is a function that allocates a chunk of memory on the heap and returns a pointer to it. It's similar to the new
operator in many languages. In this case, it's creating a block of memory that can survive for an arbitrary length of time and have an arbitrary size. That in itself is fairly in-depth stuff, which is somewhat difficult to explain and warrants a separate question.
The num + 1
compensates for the null terminator on the end of strings. Strings often need to know how long they are, and the C tradition is to allocate room for an additional character on the end of the string, which will always be the special character \0
. That allows functions that deal with the string to determine the size of the string automatically. For example, if you wanted to do something to every character of a string without knowing how long the string is, you could do something like this:
const char *ptr = str;
while (*ptr != '\0') {
process(*ptr);
ptr++;
}
malloc allocates memory.
num+1 as num is the number of characters in the string and +1 for the null terminator
Malloc in this case is allocating num+1 times sizeof(char) bytes. This is standard practice when you want to allocate an array of elements. The char in sizeof(char) is typically replaced with the type of array being allocated.
Strictly speaking in this example though the sizeof(char) is not necessary. It's guaranteed to be of size 1 by the C standard and hence just multiplying by 1.
malloc allocates an array of char (in this case) on the heap.
array will be num+1 long, but the longest string it can possibly hold is 'num' long, because string in C need a ending null-byte.
Malloc allocates memory, in this case for the string str of length num. (char *) is the type for str sizeof(char) is the number of bytes required for each character. The +1 is for the trailing null character in the string, normally zero.
Malloc is a call to allocate memory.
The above code is going to allocate space for num + 1 characters. Likely there is a string with num characters in, and the author of the code has added space for the null terminator.
After the call str will point to the start of that block of memory which has been allocated.
This code tries to allocate a chunk of memory that can hold num + 1 values of type char. So if a chat equals one byte and num is 10 it will try to allocate 11 bytes of memory and return a pointer to that memory.
+1 is likely used because the programmer wanted to store a string (character array) of num characters and needs an extra char to store the terminating '\0' (null) character. In C/C++. chracater strings are, by convention, terminated by a null-character.
malloc allocates memory from the heap and returns a pointer to it. It's useful when you don't know how much memory you are going to need at compile time.
As for why (num+1), it really depends on what the code is doing... perhaps num is the number of characters in the string, and the +1 is for the NUL terminator byte at the end. I don't know what the sizeof(char) would be for in that case, though.
sizeof(char)
is safe. One shouldn't assume a single byte per character.
My question is what are you doing programming if you don't know what malloc does?
man malloc
on a Linux system. On Windows. who knows? Probably 17 mouse clicks.
Preamble: I can't believe it! I was baffled by this kind of expression when I was taught C basics (no pun intended). This is why I go into extreme detail in the "parsing the code" section.
The first problem is parsing the code
str = (char *) malloc (sizeof(char) * (num+1));
When working with C/C++, parsing this kind of expression is mandatory, so we will break it down into its components. The first thing we see here is something like:
variable = (expression) function (expression) ;
The first time I saw it, I was just "Hey, I can't believe there is a programming language where you can call a function by putting its parameters both at the left and the right of the function call !".
In truth, this line should be read like:
variable = function_a (function_b (expression)) ;
where :
expression is sizeof(char) * (num+1)
function_b is malloc
function_a is a cast operator
As already explained elsewhere, the C-style cast operator is more like
(function_a) expression
than the more natural
function_a(expression)
Which explains the strangeness of the whole line of code.
Note that in C++, you can use both notations, but you should instead use the static_cast, const_cast, reinterpret_cast or dynamic_cast instead of the above notations. Using a C++ cast, the above line of code would be:
str = static_cast<char *> ( malloc (sizeof(char) * (num+1)) ) ;
sizeof is an operator. You can think it like a function working on types. You pass a type as a parameter, and it will give you its size in bytes.
So, if you write:
size_t i = sizeof(char) ;
size_t j = sizeof(int) ;
You'll probably have (on a 32-bits Linux) a value of 1 for i, and 4 for j. Its use in malloc is like saying "I want enough room to put 25 cars of 4 meters long" instead of "I want at least 100 meters".
Malloc's parameter is a size_t, that is, an unsigned integer. You give it the size in bytes, and if successful, it returns you the address of allocated memory large enough for you to use as an array. For example:
int * p = (int *) malloc (25 * sizeof(int)) ;
Then p points to a memory where you can put 25 integers side by side, as if inside an array whose indices go from zero to the size minux one. For example:
p[0] = 42 ; // Ok, because it's the 1st item of the array
p[24] = 42 ; // Ok, because it's the 25th item of the array
p[25] = 42 ; // CORRUPTION ERROR, because you are trying to
// use the 26th item of a 25 items array !
Note: You have pointer arithmetics, too, but this goes beyond the scope of the question.
C-style strings are somewhat different from other languages strings. Each character of a string can be of any value BUT NOT ZERO. Because zero (also noted \0) marks the end of a c string.
Put it another way: You never know the size of a c-string, but by searching the \0 character, you can know where it ends (which is one reasons of buffer overflows and stack corruption, by the way).
For example, the string "Hello" seems to have 5 characters:
"Hello" seems to be an array containing 'H', 'e', 'l', 'l' and 'o'.
But in truth, it has 6 characters, the last one being the character ZERO, which is noted using the escape character \0. Thus:
"Hello" is an array containing 'H', 'e', 'l', 'l', 'o' and 0.
This explains that when you want to allocate enough room for a string of "num" characters, you allocate instead "num + 1" characters.