views:

626

answers:

4

Is there anything I should know about using strtok on a malloced string?

In my code I have (in general terms)

char* line=getline();
Parse(dest,line);
free(line);

where getline() is a function that returns a char * to some malloced memory. and Parse(dest, line) is a function that does parsing online, storing the results in dest, (which has been partially filled earlier, from other information).

Parse() calls strtok() a variable number of times on line, and does some validation. Each token (a pointer to what is returned by strtok()) is put into a queue 'til I know how many I have.

They are then copied onto a malloc'd char** in dest.

Now free(line) and a function that free's each part of the char*[] in dest, both come up on valgrind as:

"Address 0x5179450 is 8 bytes inside a block of size 38 free'd"

or something similar.

I'm considering refactoring my code not to directly store the tokens on the the char** but instead store a copy of them (by mallocing space == to strlen(token)+1, then using strcpy()).

+2  A: 

they are then copied on to to a malloc'd char** in dest.

The strings are copied, or the pointers are copied? The strtok function modifies the string you give it so that it can give you pointers into that same string without copying anything. When you get tokens from it, you must copy them. Either that or keep the input string around as long as any of the token pointers are in use.

Many people recommend that you avoid strtok altogether because it's error-prone. Also, if you're using threads and the CRT is not thread-aware, strtok will likely crash your app.

Tim Sylvester
+1 for avoiding strtok() altogether.
Jonathan Leffler
+2  A: 

There is a function strdup which allocates memory and then copies another string into it.

Kinopiko
I'm now strdup'ing each token, before storing.I have no memory freeing errors now, and life is good.
Oxinabox
If this answers your question, please click on the "accept" checkmark. It would be good to do that for the other questions you asked here. You can undo it later if a better answer comes along.
Kinopiko
Not a standard C function though.
AndreyT
`strdup()` is easy to implement if you don't have it: http://snipplr.com/view/16919/strdup/
Michael Burr
`strdup()` is a POSIX standard function...the nice thing about standards is there are so many to choose from.
Jonathan Leffler
A: 

1 in your parse(), strtok() only writes '\0' at every matching position. actually this step is nothing special. using strtok() is easy. of course it cannot be used on read-only memory buffer.

2 for each sub-string got in parse(), copy it to a malloc()ed buffer accordingly. if i give a simple example for storing the sub-strings, it looks like the below code, say conceptually, though it might not be exactly the same as your real code:

    char **dest;
    dest = (char**)malloc(N * sizeof(char*));
    for (i: 0..N-1) {
        dest[i] = (char*)malloc(LEN);
        strcpy(dest[i], sub_strings[i]);
    NOTE: above 2 lines could be just one line as below
        dest[i] = strdup(sub_string[i]);
    }

3 free dest, conceptually again:

    for (i: 0..N-1) {
        free(dest[i]);
    }
    free(dest);

4 call free(line) is nothing special too, and it doesn't affect your "dest" even a little.

"dest" and "line" use different memory buffer, so you can perform step 4 before step 3 if preferred. if you had following above steps, no errors would occur. seems you made mistacks in step 2 of your code.

EffoStaff Effo
A: 

You ask:

Is there anything I should know about using strtok on a malloced string?

There are a number of things to be aware of. First, strtok() modifies the string as it processes it, inserting nulls ('\0') where the delimiters are found. This is not a problem with allocated memory (that's modifiable!); it is a problem if you try passing a constant string to strtok().

Second, you must have as many calls to free() as you do to malloc() and calloc() (but realloc() can mess with the counting).

In my code I have (in general terms)

   char* line=getline();
   Parse(dest,line);
   free(line);

Unless Parse() allocates the space it keeps, you cannot use the dest structure (or, more precisely, the pointers into the line within the dest structure) after the call to free(). The free() releases the space that was allocated by getline() and any use of the pointers after that yields undefined behaviour. Note that undefined behaviour includes the option of 'appearing to work, but only by coincidence'.

where getline() is a function that returns a char * to some malloced memory, and Parse(dest, line) is a function that does parsing online, storing the results in dest (which has been partially filled earlier, from other information).

Parse() calls strtok() a a variable number of times on line, and does some validation. Each token (a pointer to what is returned by strtok()) is put into a queue 'til I know how many I have.

Note that the pointers returned by strtok() are all pointers into the single chunk of space allocated by getline(). You have not described any extra memory allocation.

They are then copied onto a malloc'd char** in dest.

This sounds as if you copy the pointers from strtok() into an array of pointers, but you do not attend to copying the data that those pointers are pointing at.

Now free(line) and a function that free's each part of the char*[] in dest, both come up on valgrind as:

"Address 0x5179450 is 8 bytes inside a block of size 38 free'd"

or something similar.

The first free() of the 'char *[]' part of dest probably has a pointer to line and therefore frees the whole block of memory. All the subsequent frees on the parts of dest are trying to free an address not returned by malloc(), and valgrind is trying to tell you that. The free(line) operation then fails because the first free() of the pointers in dest already freed that space.

I'm considering refactoring my code [to] store a copy of them [...].

The refactoring proposed is probably sensible; the function strdup() already mentioned by others will do the job neatly and reliably.

Note that after refactoring, you will still need to release line, but you will not release any of the pointers returned by strtok(). They are just pointers into the space managed by (identified by) line and will all be released when you release line.

Note that you will need to free each of the separately allocated (strdup()'d) strings as well as the array of character pointers that are accessed via dest.

Alternatively, do not free line immediately after calling Parse(). Have dest record the allocated pointer (line) and free that when it frees the array of pointers. You still do not release the pointers returned by strtok(), though.

Jonathan Leffler