ansaurus

Question

Static function-scoped pointers and memory leaks

Answer 1

+1 A:

Aside from the difficulty of freeing that dynamically allocated buffer, there is another potential problem. It is not thread safe. Since it is a library function, then there is always the possibility that it will be used in a multi-threaded environment in the future.

It would probably be better to require the calling function to free the buffer via a related library function.

Mark Wilkins 2010-01-25 23:19:48

The calling function is itself a "read-line" sort of function that is called elsewhere. I don't feel comfortable going further up in the calling tree to insert a library call there-- I want this line-reading operation to be as transparent to higher levels of abstraction as possible... But you're onto something, for sure. I'll have to ponder this one carefully.

Platinum Azure 2010-01-25 23:23:11

I do see that it would be desirable to make it as simple as possible for the callers. It seems, though, that avoiding some kind of global memory that magically exists somewhere is best - otherwise it usually leads to some difficult problems to debug. Maybe a "readline" structure could encapsulate it (assuming straight C). It could have the stack variable and a dynamic pointer. The readline function could use that structure but it would live in the caller's scope. It would need a cleanup call before leaving the function that uses "readline" but it would simplify the usage.

Mark Wilkins 2010-01-25 23:53:30

Answer 2

A:

There's a few hacks I can think of, although both require moving the static declaration out of the function. I can't imagine why that would be a problem.

Using a GCC extension,

static char *buffer;
void use_buffer(size_t n) {
    buffer = realloc(buffer, n);
}
void cleanup_buffer() __attribute__((destructor)) {
    free(buffer);
}

Using C++,

static char *buffer;
static class buffer_guard {
    ~buffer_guard() { free(buffer); }
} my_buffer_guard;

In any case, I don't really like the design. In C, usually the caller is responsible for allocating/freeing memory that it needs to use, even if it's filled in by a callee.

BTW, compare with Glibc's nonstandard getline. It never uses static memory.

ephemient 2010-01-25 23:24:28

I believe this would make the buffer global and he specifically requested that it not be global.

Brian T Hannan 2010-01-25 23:43:48

It's statically scoped -- it is only visible within the single translation unit (source file), and not global to the whole program.

ephemient 2010-01-26 01:57:12

Neither compiler-specific hacks nor C++ are usable here, unfortunately.

Platinum Azure 2010-01-26 02:52:07

Answer 3

+1 A:

Instead of function scope, give it module scope (i.e. at file scope, but static, so it's not visible outside that file. Add a small function that frees the buffer, and use atexit() to assure that's called before the program exits. Alternative, don't worry about it -- a leak that happens only once, and is freed automatically as the program exits isn't particularly harmful.

I feel obliged to say that the design sounds to me like a recipe for disaster though. When you free the buffer, there's virtually no way to even guess whether it might still be in use. The user (apparently) has to keep track of where the data was returned, and copy the data to a new buffer if (and only if) you allocated one dynamically. In a multi-threading environment, you need to make the internal pointer thread-local to have any chance of working correctly at all. To the user, the function might do one of two entirely different things -- either return a buffer that's owned by the user, OR return a buffer that's owned by the function, and can only be used safely by allocating another buffer, and copying the data into the other buffer before the function is called again.

Jerry Coffin 2010-01-25 23:25:15

+1 for highlighting the ownership issue. I think you mean `atexit()` not `onexit()` though.

j_random_hacker 2010-01-26 04:28:31

Oops -- thanks for pointing out my mistake. I've fixed it now.

Jerry Coffin 2010-01-26 14:24:03

Answer 4

+1 A:

That could still be okay if you use the standard technique to indicate end-of-file (i.e have you read-line function return NULL).

What happens in this case is that after the final line is read, one more call to your read-line function will be needed so that it can return NULL to indicate that the end of file has been reached. In this last call, you can then free you buffer.

R Samuel Klatchko 2010-01-25 23:26:16

Answer 5

+1 A:

Two choices that occur immediately:

Make the pointer to the heap-allocated buffer static but file scoped. Add a (static) function that checks if it is not null and if it is not null free()s it. Call atexit(free_func) at the start of the program, where free_func is the static function. You can have some global setup routine (caled by main()) where this is done.
Don't worry about it; heap-allocated memory is released by the OS when your process exits, and the memory leak is not cumulative, so even if your program has a long life it won't raise an OOM exception (unless you have some other bug).

I assume your app is NOT multithreaded; in this case, you should not use a static buffer at all, or you should use thread-local data.

Tim Schaeffer 2010-01-25 23:26:46

Answer 6

+3 A:

If you can change the function, I would recommend changing the function interface itself. I know you have spent a lot of time debugging and testing it, but there are a few problems with your current implementation:

it is not thread-safe,
the user has no control over the data, so he must copy it if he needs it later, most likely in a buffer that's going to be malloc()ed, thus nullifying any advantage you got by the selective use of malloc() in your function,
most importantly, as you have discovered, a special action has to be taken by the user for a long last line.

Your users should not be worried by the implementation oddity of your function, they should be able to "just use it".

Unless you are doing it for educational purposes, I would recommend looking at this page, which has one implementation of "reading an arbitrarily long line from a stream", and links to other such implementations (each implementation is slightly different from the others, so you should be able to find one that you like).

Based upon your edit, MT-safe is not a requirement, and a copy is always going to happen. So, the most obvious design is one of the two:

Let the user supply a char **, which points to a buffer that your function will allocate, using a combination of malloc() and realloc() (if needed). It is the user's responsibility to free() it when done. That way, the user doesn't have to copy the data again, since he can pass a pointer to wherever the final destination of the data is.
return a char * that is allocated by your function. Again, it's the user's responsibility to free() it.

Both are pretty much equivalent.

For your current implementation, you can always return "not end of file" if the last line is very long, and doesn't end in a newline. Then, the user is going to call your function again, and then you can free your buffer. Personally, I would be happier with a function that allows me to read as many lines as I want, and not force me to go to the end of file.

Alok 2010-01-25 23:31:47

j_random_hacker 2010-01-26 04:10:25

@j_random_hacker: thanks. I feel that my post could be edited a bit for coherency, but that's mostly because of the edit by the OP. I will try to fix it if I get some time.

Alok 2010-01-26 04:14:58

+1 for the xref URL

Jonathan Leffler 2010-01-26 07:51:04

Answer 7

A:

I was just going to comment below Mark's answer, but it may feel a little bit cramped. Still, this answer is in essence a comment on his answer, which I find very good in addition to being quick :).

Not only is your function not MT-safe, but even without threads, the interface to use it correctly is complicated. The caller must have finished with the previous result before calling the function again. If this code is still in use two years from now, someone will scratch his head trying to use it right... or worse, use it wrong without even thinking about it. That person could even be you...

Mark's suggestion (requiring the caller to free the buffer) is IMHO the most reasonable. But perhaps you don't trust malloc and free not to cause fragmentation in the long run, or have some other reason to prefer the static buffer solution. In this case you can keep the static buffer for ordinary-length lines, define a boolean flag that indicates if the static buffer is currently busy, and document that the following function (and not free) should be called with the address of the buffer when the caller no longer uses it:

char static_buffer[512];
int buffer_busy;

void free_buffer(char *p)
{
  if (p == static_buffer)
  {
     assert(buffer_busy);
     buffer_busy=0;
  }
  else free(p);
}

char *get_line(...)
{
  char *result;
  if (..short line..)
  {
     result = static_buffer;
     assert(!buffer_busy);
     buffer_busy=1;
  }
  else result = malloc(...);
  ...
  return result;
}

The only circumstances in which the assertions will trigger are circumstances in which your previous implementation would have silently gone wrong, and the overhead is very low compared to your existing solution (only toggling the flag, and asking the caller to call free_buffer when he's finished, which is cleaner). If the assertion in get_line in particular triggers, it means you needed dynamic allocation after all, because the caller could not be finished with a buffer at the time he was asking for another.

Note: this is still not MT-safe.

Pascal Cuoq 2010-01-25 23:42:25

Answer 8

+1 A:

The interface you have chosen makes this an unsolvable problem:

The client must not know if the return value points to static or dynamic memory.
The return value must point to memory that outlives the call.
Any call might be the last.

I'm not sure why you are troubled by this leak. After all, if the client reads a very long line, does something with the line, then does a ton of computation and allocation before reading the next line, you still have a big hunk of memory sitting around unused, clogging up the system. If this OK with you (arbitrary computation takes place before memory is reclaimed), you could just fess up that you're willing to retain dead memory indefinitely.

If you can't live with the leak, the simplest thing to do is to widen the interface so that the client can notify your function when the client is done with the memory. (Right now the contract with the client says that the client owns the memory until it calls your function again, at which point ownership reverts to your function.) Of course, to change the interface means either

adding a new function, which would require you to promote your pointer to be static but local to the compilation unit, or
adding some argument to the existing function (or overloading an argument) so that you have a call which means "I am done with your memory now, but I don't want another line".

A more radical change would be to rewrite the function to use dynamically allocated memory throughout its lifetime, gradually enlarging the block as needed until it is as large as the largest block ever read (or perhaps rounded up to the next power of two). Depending on actual cases this strategy may consume less address space than keeping a big static buffer.

In any case I'm not convinced you should be worrying about this corner case. If you think this case matters, please edit your question to show us the evidence.

Norman Ramsey 2010-01-26 03:43:17

+1 for pointing out that, even forgetting about the last-line case, your current design leaves the client with no way to reclaim the potentially huge chunk of memory allocated by your function -- all it can do is call it again and hope for a shorter line this time.

j_random_hacker 2010-01-26 04:26:58

ansaurus

tags:

views:

answers:

Static function-scoped pointers and memory leaks

related questions