views:

456

answers:

10

What is the proper/preferred way to allocate memory in a C API?

I can see, at first, two options:

1) Let the caller do all the (outer) memory handling:

myStruct *s = malloc(sizeof(s));
myStruct_init(s);

myStruct_foo(s);

myStruct_destroy(s);
free(s);

The _init and _destroy functions are necessary since some more memory may be allocated inside, and it must be handled somewhere.

This has the disadvantage of being longer, but also the malloc can be eliminated in some cases (e.g., it can be passed a stack-allocated struct:

int bar() {
    myStruct s;
    myStruct_init(&s);

    myStruct_foo(&s);

    myStruct_destroy(&s);
}

Also, it's necessary for the caller to know the size of the struct.

2) Hide mallocs in _init and frees in _destroy.

Advantages: shorter code, since the functions are going to be called anyway. Completely opaque structures.

Disadvantages: Can't be passed a struct allocated in a different way.

myStruct *s = myStruct_init();

myStruct_foo(s);

myStruct_destroy(foo);

I'm currently leaning for the first case; then again, I don't know about C API design.

+5  A: 

My favourite example of a well-design C API is GTK+ which uses method #2 that you describe.

Although another advantage of your method #1 is not just that you could allocate the object on the stack, but also that you could reuse the same instance multiple times. If that's not going to be a common use case, then the simplicity of #2 is probably an advantage.

Of course, that's just my opinion :)

Dean Harding
Now, this is a interesting comment. I've heard many people say exactly the opposite, that GTK+ is a terrible API. I've unfortunately only used it a little, I'm usually up in the clouds of C++, and using Gtkmm. My experience remembers ref-counted pointers, and _new and _free functions, however, which seems to match the 3rd option more. I'd be curious as to your reasons to your opinion.
Thanatos
The general design philosophy of GLib/Gtk seems to be "we won't use C++ on principle, so we'll hand-code all the same stuff". This approach has some advantages in a sense that it's still a pure C API, which makes it easier to use with various C-only FFIs... but from a pure C/C++ perspective, it seems to be rather impractical.
Pavel Minaev
+1 Mentioning GTK+.If you are accustomed with OOP, GTK seems very natural.
Andrei Ciobanu
A: 

Both are acceptable - there's tradeoffs between them, as you've noted.

There's large real world examples of both - as Dean Harding says, GTK+ uses the second method; OpenSSL is an example that uses the first.

caf
+5  A: 

Another disadvantage of #2 is that the caller doesn't have control over how things are allocated. This can be worked around by providing an API for the client to register his own allocation/deallocation functions (like SDL does), but even that may not be sufficiently fine-grained.

The disadvantage of #1 is that it doesn't work well when output buffers are not fixed-size (e.g. strings). At best, you will then need to provide another function to obtain the length of the buffer first so that the caller can allocate it. At worst, it is simply impossible to do so efficiently (i.e. computing length on a separate path is overly expensive over computing-and-copying in one go).

The advantage of #2 is that it allows you to expose your datatype strictly as an opaque pointer (i.e. declare the struct but don't define it, and use pointers consistently). Then you can change the definition of the struct as you see fit in future versions of your library, while clients remain compatible on binary level. With #1, you have to do it by requiring the client to specify the version inside the struct in some way (e.g. all those cbSize fields in Win32 API), and then manually write code that can handle both older and newer versions of the struct to remain binary-compatible as your library evolves.

In general, if your structs are transparent data which will not change with future minor revision of the library, I'd go with #1. If it is a more or less complicated data object and you want full encapsulation to fool-proof it for future development, go with #2.

Pavel Minaev
+1 for the point about abstraction and opaque pointers - this is a big advantage as it completely decouples your implementation from the calling code
Paul R
+3  A: 

Both are functionally equivalent. But, in my opinion, method #2 is easier to use. A few reasons for prefering 2 over 1 are:

  1. It is more intuitive. Why should I have to call free on the object after I have (apparently) destroyed it using myStruct_Destroy.

  2. Hides details of myStruct from user. He does not have to worry about it's size, etc.

  3. In method #2, myStruct_init does not have to worry about the initial state of the object.

  4. You don't have to worry about memory leaks from user forgetting to call free.

If your API implementation is being shipped as a separate shared library however, method #2 is a must. To isolate your module from any mismatch in implementations of malloc/new and free/delete across compiler versions you should keep memory allocation and de-allocation to yourself. Note, this is more true of C++ than of C.

Rajorshi
Both are *not* equivalent, because the latter requires dynamic allocation, and the former does not.
Tom
Well...yeah. Should have said functionally equivalent. Updated.
Rajorshi
+1  A: 

The problem I have with the first method is not so much that it is longer for the caller, it's that the api now is handcuffed on being able to expand the amount of memory it is using precisely because it doesn't know how the memory it received was alloced. The caller doesn't always know ahead of time how much memory it will need (imagine if you were trying to implement a vector).

Another option you didn't mention, which is going to be overkill most of the time, is to pass in a function pointer that the api uses as an allocator. This doesn't allow you to use the stack, but does allow you to do something like replace the use of malloc with a memory pool, which still keeping the api in control of when it wants to allocate.

As for which method is proper api design, it's done both ways in the C standard library. strdup() and stdio uses the second method while sprintf and strcat use the first method. Personally I prefer the second method (or third) unless 1) I know I will never need to realloc and 2) I expect the lifetime of my objects to be short and thus using the stack is very convienent

edit: There is actually 1 other option, and it is a bad one with a prominent precedent. You could do it the way strtok() does it with statics. Not good, just mentioned for completeness sake.

frankc
A: 

Both ways are ok, I tend to do the first way as a lot of the C I do is for embedded systems and all the memory is either tiny variables on the stack or statically allocated. This way there can be no running out of memory, either you have enough at the beginning or you're screwed from the start. Good to know when you have 2K of Ram :-) So all my libraries are like #1 where the memory is assumed to be allocated.

But this is an edge case of C development.

Having said that, I'd probablly go with #1 still. Perhaps using init and finalize/dispose (rather than destroy) for names.

Keith Nicholas
+1  A: 

That could give some element of reflexion:

case #1 mimick the memory allocation scheme of C++, with more or less the same benefits :

  • easy allocation of temporaries on stack (or in static arrays or such to write you own struct allocator replacing malloc).
  • easy free of memory if anything goes wrong in init

case #2 hides more informations on used structure and can also be used for opaque structures, typically when structure as seen by user is not exactly the same as internally used by the lib (say there could be some more fields hidden at the end of structure).

Mixed API between case#1 and case #2 is also common : there is a field used to pass in a pointer to some already initialized structure, if it is null it is allocated (and pointer is always returned). With such API the free is usually responsibility of caller even if init performed allocation.

In most cases I would probably go for case #1.

kriss
+5  A: 

Why not provide both, to get the best of both worlds?

Use _init and _terminate functions to use method #1 (or whatever naming you see fit).

Use additional _create and _destroy functions for the dynamic allocation. Since _init and _terminate already exist, it effectively boils down to:

myStruct *myStruct_create ()
{
    myStruct *s = malloc(sizeof(*s));
    if (s) 
    {
        myStruct_init(s);
    }
    return (s);
}

void myStruct_destroy (myStruct *s)
{
    myStruct_terminate(s);
    free(s);
}

If you want it to be opaque, then make _init and _terminate static and do not expose them in the API, only provide _create and _destroy. If you need other allocations, e.g. with a given callback, provide another set of functions for this, e.g. _createcalled, _destroycalled.

The important thing is to keep track of the allocations, but you have to do this anyway. You must always use the counterpart of the used allocator for deallocation.

Secure
A: 

I would go for (1) with one simple extension, that is to have your _init function always return the pointer to the object. Your pointer initialization then may just read:

myStruct *s = myStruct_init(malloc(sizeof(myStruct)));

As you can see the right hand side then only has a reference to the type and not to the variable anymore. A simple macro then gives you (2) at least partially

#define NEW(T) (T ## _init(malloc(sizeof(T))))

and your pointer initialization reads

myStruct *s = NEW(myStruct);
Jens Gustedt
How do you handle a malloc failure?
Secure
@Secure: Good point. I think `_init` functions should be made robust to passing in a `NULL` pointer and just pass this through on return. The check for that is than left to the user of the pointer, as usual.
Jens Gustedt
The other design philosophy in this regard is that most functions should expect valid pointers (with the obvious exception of deallocators) and assert() them to not being NULL. Which would make your approach to effectively use assert for the program logic, which is a big no-go. It depends on the overall design of your program, for sure, but personally I prefer to be explicit with error handling. I.e. malloc is used separately and tested for validity before anything else is done with the pointer.
Secure
@Secure: I would tend to just extend the convention to check pointers returned by the macro `NEW`. This is only a slight extension of such a convention since you'd have to check several functions for that already, not only `malloc` but also `realloc` and `calloc` (and maybe others that I forget).
Jens Gustedt
+1  A: 

Method number 2 every time.

Why? because with method number 1 you have to leak implementation details to the caller. The caller has to know at least how big the struct is. You can't change the internal implementation of the object without recompiling any code that uses it.

JeremyP