views:

766

answers:

6

I'm a total C newbie, I come from C#. I've been learning about memory management and the malloc() function. I've also came across this code:

char *a_persons_name = malloc(sizeof(char) + 2);

What I don't understand is how much space this is allocating for a_persons_name. Is it allocating 2 characters (eg. AB) or something else?

I also know that you can sometimes get "lucky" with malloc and use unallocated space (which can result in data corruption and seg faults). So how do I know how much space I'm allocating and how much I will need?

+1  A: 

Your call to malloc will allocate 3 bytes of memory. sizeof(char) is 1 byte and 2 bytes are indicated explicitly. This gives you enough space for a string of size 2 (along with the termination character)

Brandon E Taylor
+9  A: 

That snippet is allocating enough space for a 2-character name.

Generally the string buffer is going to be filled from somewhere, i.e. I/O. If the size of the string isn't known ahead of time (e.g. reading from file or keyboard), one of three approaches are generally used:

  • Define a maximum size for any given string, allocate that size + 1 (for the null terminator), read at most that many characters, and error or blindly truncate if too many characters were supplied. Not terribly user friendly.

  • Reallocate in stages (preferably using geometric series, e.g. doubling, to avoid quadratic behaviour), and keep on reading until the end has been reached. Not terribly easy to code.

  • Allocate a fixed size and hope it won't be exceeded, and crash (or be owned) horribly when this assumption fails. Easy to code, easy to break. For example, see gets in the standard C library. (Never use this function.)

Barry Kelly
Why do all the ways to allocate enough space suck? IS THERE NO EASY WAY!
Lucas McCoy
Strings are the most broken part of C. I recommend coding up a pseudo-OO 'StringBuilder' struct or similar, and creating e.g. StrBufPrintf, StrBufGets, StrBufScanf, etc. to centralize these kind of operations. The standard C library doesn't help much. C++ is slightly better, because you usually have 10s of different string classes to choose from, one for each distinct framework being used. Yes, I'm being sarcastic.
Barry Kelly
The easy way is to either (1) use a language where a string is a basic type; (2) use a library which provides string behavior; or (3) learn the language you're using. If you don't want to learn how to use the tools, why are you even trying. Find another language that's more suited to you (I'm not trying to be insulting here, just pragmatic).
paxdiablo
@Pax: I really wanted to learn the fundamentals of C so that I could help contribute to GEdit (which I know is written in C and uses GTK). I figured it would be best to learn all I could about memory management before doing anything with a real (large) program. That being said, if I'm going to write a program it's going to be done in C#.
Lucas McCoy
It's not really that strings are broken in c. It's that strings are surprisingly *hard* to do right, and c provides little support above the bare metal. Language which provide "easy" strings have **a lot** going on under the hood (every one of them).
dmckee
A language in which the easiest way to do strings is the wrong way to do strings, and the standard library includes an example of the wrong way, seems broken with respect to strings to me.
Barry Kelly
@LucasA, you may find that GTK provides abstraction code for strings as well. GLib contains a GString abstraction (and lists and others) which may make your life easier. I'm not advocating not learning how bare-metal C does strings (you should), just stating that it may not be absolutely necessary for the domain you're interested in.
paxdiablo
@BarryK, strings are easy to do in C if you know what you're doing. I have C string processing code that I wrote back in '84 that hasn't been updated since '96 and it gives me everything I need. Yes, many things are hard to do in the base C language but that's one of the reasons you have functions, so you can abstract away the difficulties - you only have to do that once, then amortize the cost over your entire career.
paxdiablo
And saying C is broken is the same as saying C++ is broken since it can't do 256-bit integers or both are broken due to the limited precision of floating point - they are what they are. Broken means "doesn't match the spec", not "could be done better" (IMNSHO).
paxdiablo
Strings aren't transparent in c because c isn't a high level language: you can still see the bare metal from c, and nothing is hidden. If you want strings that "just work" you have to give that up. Complaints about the standard library I get, but that's the result of history: that library was developed on machines orders of magnitude less powerful than your cell phone.
dmckee
For sure, strings are easy in C as long as you build the appropriate abstractions yourself. But speaking as a language designer and implementor, I believe it's a fact that C has been almost single-handedly responsible for millions, if not billions, of dollars worth of damage through its particularly weak approach to strings.
Barry Kelly
The power of the machine does not excuse the lack of correctness of the implementation.
Barry Kelly
+5  A: 

Well, for a start, sizeof(char) is always 1, so you could just malloc(3).

What you're allocating there is enough space for three characters. But keep in mind you need one for a null terminator for C strings.

What you tend to find is things like:

#define NAME_SZ 30
: : :
char *name = malloc (NAME_SZ+1);

to get enough storage for a name and terminator character (keeping in mind that the string "xyzzy" is stored in memory as:

+---+---+---+---+---+----+
| x | y | z | z | y | \0 |
+---+---+---+---+---+----+

Sometimes with non-char based arrays, you'll see:

int *intArray = malloc (sizeof (int) * 22);

which will allocate enough space for 22 integers.

paxdiablo
(type and convenience) `int *intArray = malloc(sizeof(*intArray) * 22);`
kaizer.se
"Well, for a start, sizeof(char) is always 1"FALSE.C specifies 1 byte as a LOWER BOUNDS for the size of a char. The actual size is both architecture and compiler dependent. On some more obscure arcitectures a char is 16 bits.
Anon E. Mous
No, actually, sizeof(char) is *always* 1. From c1x, "6.5.3.4 The sizeof operator", para 3: When applied to an operand that has type char, unsigned char, or signed char, (or a qualified version thereof) the result is 1.
paxdiablo
See http://stackoverflow.com/questions/1535131/potential-problem-with-c-standard-mallocing-chars for more detail: the C std defines byte as the addressable unit but it's not necessarily an 8-bit byte (octet).
paxdiablo
+1  A: 

This will allocate three bytes; 1 for sizeof(char), plus two. Just seeing that line out of context, I have no way of knowing why it would be allocated that way or if it is correct (it looks fishy to me).

You need to allocate enough memory to hold whatever you need to put in it. For example, if you're allocating memory to hold a string, you need to allocate enough memory to hold the longest string expected plus one byte for the terminating null. If you're dealing with ASCII strings, that's easy: one byte per character plus one. If you're using unicode strings, things get more complicated.

Fred Larson
+1  A: 

malloc() will allocate a block of memory and return a pointer to that memory if successful, and NULL if unsuccessful. the size of the block of memory is specified by malloc's argument, in bytes.

the sizeof operator gives the size of its argument in bytes.

char *someString = malloc(sizeof(char) * 50)

this will allocate enough space for a 49 character string (a C-style string must be terminated by a NULL ('\0') character) not including the NULL character, and point someString at that memory.

It looks like that code in your question should be malloc(sizeof(char) * 2);, as sizeof(char) + 2 doesn't make sense.

note that sizeof(char) is guaranteed to always equal 1 (byte) -- but the memory representation of other types (such as long) may vary between compilers.

The way that you get (un)lucky with dynamically allocated memory is if you try to read/write outside of memory you have allocated.

For example,

char *someString = malloc(10);
strcpy(someString, "Hello there, world!");
printf("%s\n", someString);

The first line allocates enough room for 9 characters, and a NULL character.
The second line attempts to copy 20 characters (19 + NULL) into that memory space. This overruns the buffer and might cause something incredibly witty, such as overwriting adjacent memory, or causing a segfault.

The third line might work, for example if there was allocated memory right beside someString, and "Hello there, world!" ran into that memory space, it might print your string plus whatever was in the next memory space. If that second space was NULL terminated, it would then stop--unless it wasn't, in which case it would wander off and eventually segfault.

This example is a pretty simple operation, yet it's so easy to go wrong. C is tricky -- be careful.

Carson Myers
+1  A: 

First point - it is a good habit to never put absolute numbers in the argument to malloc, always use sizeof and a multiple. As said above, the memory allocated for some types varies with compiler and platform. In order to guarantee gettin enough space for an array of type 'blob' it is best to use something like this:

blob *p_data = malloc(sizeof(blob) * length_of_array);

This way, whatever the type is, however it looks in memory you'll get exactly the right amount.

Secondly, segfaults etc. C, as a low level language, has no bounds checking. This means that there is nothing to check you are looking at an index not actually in the array. In fact it doesn't stop you accessing memory anywhere even if it doesn't belong to your program (although your operating system might, thats what a segfault is). This is why, whenever you pass an array around in C you need to pass its length as well, so that the function receiving the array knows how big it is. Don't forget that an 'array' is really just a pointer to the first element. This is very unhelpful when passing strings around - every string argument would become two arguments, so a cheat is used. Any standard C string is NULL terminated. The last character in the string should be ASCII value 0. Any string functions work along the array until they see that and then stop. This way they don't overrun the array, but if its not there for some reason, they will. That being understood

strlen("Hello")

is 5, but to store it you need one more character. E.g.:

const char str1 = "Hello";
char *str2 = malloc(sizeof(char) * (strlen(str1) + 1));
strcpy(str2, str1);

And yes, sizeof(char) is unnecessary because it is defined to be 1, but I find it clearer and it is definitely a good habit.

WillW