tags:

views:

2133

answers:

9

I've tried to write a string replace function in C, which works on a char * which has been allocated using malloc(). It's a little different in that it will find and replace strings, rather than characters in the starting string.

It's trivial to do if the search and replace strings are the same length (or the replace string is shorter than the search string), since I have enough space allocated. If I try to use realloc(), I get an error that tells me I am doing a double free - which I don't see how I am, since I am only using realloc().

Perhaps a little code will help:

void strrep(char *input, char *search, char *replace) {
    int searchLen = strlen(search);
    int replaceLen = strlen(replace);
    int delta = replaceLen - searchLen;
    char *find = input;
    while (find = strstr(find, search)) {
        if (delta > 0) {
            realloc(input, strlen(input) + delta);
            find = strstr(input, search);            
        }
        memmove(find + replaceLen, find + searchLen, strlen(input) - (find - input));
        memmove(find, replace, replaceLen);
    }
}

The program works, until I try to realloc() in an instance where the replaced string will be longer than the initial string. (It still kind of works, it just spits out errors as well as the result).

If it helps, the calling code looks liks:

#include <stdio.h>
#include <string.h>
#include <stdlib.h>

void strrep(char *input, char *search, char *replace);

int main(void) {
    char *input = malloc(81);
    while ((fgets(input, 81, stdin)) != NULL) {
        strrep(input, "Noel", "Christmas");
    }
}

[Edit: I'm not seeing any HTML-escaped chars.]

+4  A: 

Just a shot in the dark because I haven't tried it yet but when you realloc it returns the pointer much like malloc. Because realloc can move the pointer if needed you are most likely operating on an invalid pointer if you don't do the following:

input = realloc(input, strlen(input) + delta);
John Downey
And if realloc fails, it returns NULL, and leaves the existing buffer alone. You've just lost the pointer... :-(
Roger Lipscombe
+2  A: 

Note, try to edit your code to get rid of the html escape codes.

Well, though it has been a while since I used C/C++, realloc that grows only reuses the memory pointer value if there is room in memory after your original block.

For instance, consider this:

(xxxxxxxxxx..........)

If your pointer points to the first x, and . means free memory location, and you grow the memory size pointed to by your variable by 5 bytes, it'll succeed. This is of course a simplified example as blocks are rounded up to a certain size for alignment, but anyway.

However, if you subsequently try to grow it by another 10 bytes, and there is only 5 available, it will need to move the block in memory and update your pointer.

However, in your example you are passing the function a pointer to the character, not a pointer to your variable, and thus while the strrep function internally might be able to adjust the variable in use, it is a local variable to the strrep function and your calling code will be left with the original pointer variable value.

This pointer value, however, has been freed.

In your case, input is the culprit.

However, I would make another suggestion. In your case it looks like the input variable is indeed input, and if it is, it shouldn't be modified, at all.

I would thus try to find another way to do what you want to do, without changing input, as side-effects like this can be hard to track down.

Lasse V. Karlsen
+4  A: 

As a general rule, you should never do a free or realloc on a user provided buffer. You don't know where the user allocated the space (in your module, in another DLL) so you cannot use any of the allocation functions on a user buffer.

Provided that you now cannot do any reallocation within your function, you should change a little its behavior like doing only one replacement, so the user will be able to compute the resulting string max length and provide you with a buffer long enough for this one replacement to occur.

Then you could create another function to do the multiple replacements, but you will have to allocate the whole space for the resulting string and copy the user input string. Then you must provide a way to delete the string you allocated.

Resulting in:

void  strrep(char *input, char *search, char *replace);
char* strrepm(char *input, char *search, char *replace);
void strrepmfree(char *input);
Vincent Robert
+1  A: 
TyBoer
A: 

I've tried using input = realloc(...), but that appears to just case some other problems, which, considering this is a slightly "non-recommended" practice of altering strings that I don't know where they've come from, I will take lassevk's advice of doing it another way...

...Since this isn't really production code (it's for learning, only), and the specification said "you can limit input to 80 characters", and the only increasing replace is "Noel" -> "Christmas", I can just initially allocate 161 characters, and leave it be.

Matthew Schinckel
+1  A: 

This seems to work;

char *strrep(char *string, const char *search, const char *replace) {
char *p = strstr(string, search);

if (p) {
int occurrence = p - string;
int stringlength = strlen(string);
int searchlength = strlen(search);
int replacelength = strlen(replace);

if (replacelength > searchlength) {
string = (char *) realloc(string, strlen(string)
+ replacelength - searchlength + 1);
}

if (replacelength != searchlength) {
memmove(string + occurrence + replacelength,
string + occurrence + searchlength,
stringlength - occurrence - searchlength + 1);
}

strncpy(string + occurrence, replace, replacelength);
}

return string;
}

Sigh, is there anyway to post code without it sucking?

Mark
A: 

@Mark: That seems to only change the first occurrence. Which is probably reasonable, since I didn't really state that it must change all of them!

Matthew Schinckel
+6  A: 

First off, sorry I'm late to the party. This is my first stackoverflow answer. :)

As has been pointed out, when realloc() is called, you can potentially change the pointer to the memory being reallocated. When this happens, the argument "string" becomes invalid. Even if you reassign it, the change goes out of scope once the function ends.

To answer the OP, realloc() returns a pointer to the newly-reallocated memory. The return value needs to be stored somewhere. Generally, you would do this:

data *foo = malloc(SIZE * sizeof(data));
data *bar = realloc(foo, NEWSIZE * sizeof(data));
/* Test bar for safety before blowing away foo */
if (bar != NULL)
{
foo = bar;
bar = NULL;
}
else
{
fprintf(stderr, "Crap. Memory error.\n");
free(foo);
exit(-1);
}

As TyBoer points out, you guys can't change the value of the pointer being passed in as the input to this function. You can assign whatever you want, but the change will go out of scope at the end of the function. In the following block, "input" may or may not be an invalid pointer once the function completes:

void foobar(char *input, int newlength)
{
/* Here, I ignore my own advice to save space. Check your return values! */
input = realloc(input, newlength * sizeof(char));
}

Mark tries to work around this by returning the new pointer as the output of the function. If you do that, the onus is on the caller to never again use the pointer he used for input. If it matches the return value, then you have two pointers to the same spot and only need to call free() on one of them. If they don't match, the input pointer now points to memory that may or may not be owned by the process. Dereferencing it could cause a segmentation fault.

You could use a double pointer for the input, like this:

void foobar(char **input, int newlength)
{
*input = realloc(*input, newlength * sizeof(char));
}

If the caller has a duplicate of the input pointer somewhere, that duplicate still might be invalid now.

I think the cleanest solution here is to avoid using realloc() when trying to modify the function caller's input. Just malloc() a new buffer, return that, and let the caller decide whether or not to free the old text. This has the added benefit of letting the caller keep the original string!

Tryke
+1  A: 

Someone else apologized for being late to the party - two and a half months ago. Oh well, I spend quite a lot of time doing software archaeology.

I'm interested that no-one has commented explicitly on the memory leak in the original design, or the off-by-one error. And it was observing the memory leak that tells me exactly why you are getting the double-free error (because, to be precise, you are freeing the same memory multiple times - and you are doing so after trampling over the already freed memory).

Before conducting the analysis, I'll agree with those who say your interface is less than stellar; however, if you dealt with the memory leak/trampling issues and documented the 'must be allocated memory' requirement, it could be 'OK'.

What are the problems? Well, you pass a buffer to realloc(), and realloc() returns you a new pointer to the area you should use - and you ignore that return value. Consequently, realloc() has probably freed the original memory, and then you pass it the same pointer again, and it complains that you're freeing the same memory twice because you pass the original value to it again. This not only leaks memory, but means that you are continuing to use the original space -- and John Downey's shot in the dark points out that you are misusing realloc(), but doesn't emphasize how severely you are doing so. There's also an off-by-one error because you do not allocate enough space for the NUL '\0' that terminates the string.

The memory leak occurs because you do not provide a mechanism to tell the caller about the last value of the string. Because you kept trampling over the original string plus the space after it, it looks like the code worked, but if your calling code freed the space, it too would get a double-free error, or it might get a core dump or equivalent because the memory control information is completely scrambled.

Your code also doesn't protect against indefinite growth -- consider replacing 'Noel' with 'Joyeux Noel'. Every time, you would add 7 characters, but you'd find another Noel in the replaced text, and expand it, and so on and so forth. My fixup (below) does not address this issue - the simple solution is probably to check whether the search string appears in the replace string; an alternative is to skip over the replace string and continue the search after it. The second has some non-trivial coding issues to address.

So, my suggested revision of your called function is:

char *strrep(char *input, char *search, char *replace) {
    int searchLen = strlen(search);
    int replaceLen = strlen(replace);
    int delta = replaceLen - searchLen;
    char *find = input;
    while ((find = strstr(find, search)) != 0) {
        if (delta > 0) {
            input = realloc(input, strlen(input) + delta + 1);
            find = strstr(input, search);            
        }
        memmove(find + replaceLen, find + searchLen, strlen(input) + 1 - (find - input));
        memmove(find, replace, replaceLen);
    }
    return(input);
}

This code does not detect memory allocation errors - and probably crashes (but if not, leaks memory) if realloc() fails. See Steve Maguire's 'Writing Solid Code' book for an extensive discussion of memory management issues.

Jonathan Leffler
Thanks, this is a really good analysis of what I was doing wrong (and that the double-free was in a sense a by-product of the several things I was doing wrong.)I think I had it in my head that realloc() just extended the memory allocation - which makes no sense at all, when I think about it!
Matthew Schinckel