views:

118

answers:

3

I'm using a function I found here to save a webpage to memory with cURL:

struct WebpageData {
    char *pageData;
    size_t size;
};

size_t storePage(void *input, size_t size, size_t nmemb, void *output) {
    size_t realsize = size * nmemb;

    struct WebpageData *page = (struct WebpageData *)output;

    page->pageData = (char *)realloc(page->pageData, page->size + realsize + 1);
    if(page->pageData) {
        memcpy(&(page->pageData[page->size]), input, realsize);
        page->size += realsize;
        page->pageData[page->size] = 0;
    }

    return realsize;
}

and find the line:

page->pageData = (char *)realloc(page->pageData, page->size + realsize + 1);

is causing a memory leak of a few hundred bytes per call. The only real change I've made from the original source is casting the line in question to a (char *), which my compiler (gcc, g++ specifically if it's a c/c++ issue, but gcc also wouldn't compile with the uncast statement) insisted upon, but I assume this is the source of the leak. Can anyone elucidate?

Thanks

A: 

If developing on *nix generally speaking I would try running the program with valgrind (http://valgrind.org/) for memory leak realted issues (in this case I think you do know where the memory is being assigned, but where is it being freed?). In this case I would suggest not using malloc, realloc and other similar c memory management in a c++ program unless absolutely necessary and unavoidable. They should be avoided in favour of using the c++ memory management tools, in this case I think memory is not being freed up properly. Using c++ Vectors would likely make your life a lot easier here as you wouldn't need to worry about the resizing of the array and keeping track of all the changes in memory allocation.

shuttle87
@shuttle87: I don't believe the OP's issue is *detecting* the memory leak; it is finding the cause of the memory leak. Valgrind isn't good for that if we've already narrowed down the issue to this function. Finally, Valgrind is *nix only which might be an issue. Not downvoting -- just pointing out potential issues.
Billy ONeal
@Billy ONeal: cheers, edited the post to improve some of the issues you pointed out.
shuttle87
A: 

The code you have posted (as far as I can tell) is correct. If it's leaking, I suspect that you are forgetting to free() the memory block at some point. realloc is allowed to create an entire new memory block if it can't simply expand the existing one, and this is of interest to you. It is also of course allowed to allocate a larger block than needed, which might result in the phantom leaks.

Now, since you are using C++, I have to ask: why aren't you using std::vector instead?

struct WebpageData {
    std::vector<char> pageData;
    size_t size;
};

size_t storePage(void *input, size_t size, size_t nmemb, void *output) {
    size_t realsize = size * nmemb;
    WebpageData *page = reinterpret_cast<WebpageData *>(output);

    page->pageData.resize(page->size + realsize + 1);
    memcpy(&(page->pageData[page->size]), input, realsize);
    page->size += realsize;
    page->pageData[page->size] = 0;

    return realsize;
}
Billy ONeal
Thanks, the vectors solved the problem. I wasn't using them because I'm actually building this program as a means of learning C++, so for the minute I'm pretty bad at it. For reference, should I have been calling free() prior to every call to storePage? I was using the same instance of WebpageData for every call. Cheers.
wyatt
Also, what's the best way to convert a vector<char> into a string? I assume the iterative process I've jury-rigged is pretty poor.
wyatt
Billy ONeal
A: 

The cast shouldn't make any diference to realloc. Are you sure you have a memory leak? The code is appending the data, so if you ask it to storePage three times with 1kB of data each time, it will store 3kB. Is that what you intended when you copied and pasted the code? And of course you must free() the block somewhere when you're finished with it.

Other thoughts:

  • Was the memory you are reallocing originally malloc'd or realloc'd?

  • Note that if you are unable to realloc the data, you will lose it all. If you realloc into a temporary pointer and then only overwrite page->pageData if it is valid, this won't happen, and you'll be able to report the failure to the caller (Although this is very unlikely to occur in practice, and you'll probably have far larger problems if it does!)

  • You're reallocing the block every time you receive new data. It will probably be more efficient to allocate a larger block than required, retrieve the data into it, and then realloc it into an exact-fit block only after all the data is receieved, so that you avoid repeatedly reallocing the block.

Jason Williams