tags:

views:

427

answers:

9

I am working on a library that support multiple programming environment such as VB6 and FoxPro. I have to stick with C convention as it is the lowest common denominator. Now I have a question regarding the style.

Suppose that the function process input and returns a string. During the process, the error can happen. The current proposed style is this:

int func(input params... char* buffer, unsigned int* buffer_size);

The good thing about this style is that everything is included in the prototype, including error code. And memory allocation can be avoided. The problem is that the function is quite verbose. And because the buffer_size can be any, it requires more code to implement.

Another option is to return char*, and return NULL to indicate error:

char* func(input params...);

This style requires caller to delete the buffer. Memory allocation is required so a server program could face memory fragmentation issue.

A variant of the second option is to use a thread local variable to hold the returned pointer char*, so that the user does not need to delete the buffer.

Which style do you like? And reason?

+1  A: 

The second variant is cleaner.

COM IErrorInfo is an implementation of the second approach. The server calls SetErrorInfo to set details of what went wrong and returns an error code. The caller examines the code and can call GetErrorInfo to get the details. The caller is responsible for releasing the IErrorInfo, but passing the parameters of each call in the first variant is not beautiful either.

The server could preallocate enough memory on start so that it surely has enough memory to return error details.

sharptooth
+2  A: 

If I have to choose between the two styles shown I'd go for the 1st one every time. The 2nd style gives the users of your library something else to think about, memeory allocation, and someone is bound to forget to free the memory.

Jackson
+5  A: 

I would prefer the first definition, where buffer and its size are passed in. There are exceptions, but usually you don't expect to have to clean up after the functions you call. Whereas if I allocate memory and pass it into a function, then I know that I have to clean up after myself.

Handling different sized buffers shouldn't be a big deal.

Evan Shaw
This is largely how the Windows API itself does it, so it's logical to emulate that.
Rob K
+2  A: 

Another issue with the second style is that the contexts that the memory is allocated on may be different. For example:

// your library in C
char * foo() {
   return malloc( 100 );
}

// my client code C++
char * p =  foo();      // call your code
delete p;               // natural for me to do, but ... aaargh!

And this is only a minor part of the problem. You can say that both sides should use malloc & free, but what if they are using diffeent compiler implementations? It is better for all allocations and deallocations to occur in the same place. whether this is the library r the client code is up to you.

anon
+1  A: 

The first edition would be less error prone when other programmers use it.

If programmers have to allocate memory themselves they are more likely to remember to free it. If a library allocates memory for them it's yet another abstraction and can/will lead to complications.

Steve Lazaridis
+1  A: 

Few things to ponder;

  • Allocation and deallocation should happen at the same scope (ideally). It is best to pass in an pre-allocated buffer by the caller. The caller can safely free this later on. This poses the question -- how big the buffer should be? An approach that I've seen used fairly widely in Win32 is to pass NULL as the input buffer and the size parameter will tell you how much you need.

  • How many possible error conditions do you oversee? Returning a char* may limit the extent of error reporting.

  • What pre and post conditions do you want to be fulfilled? Does your prototype reflect that?

  • Do you do error checking in the caller or the callee?

I can't really tell you one is better than the other, since I don't have the big picture. But I am sure these things can get your started thinking as well as the other posts.

dirkgently
+8  A: 

I'm a bit "damaged goods" when it comes to this subject. I used to design and maintain fairly large APIs for embedded telecom. A context where you cannot take anything for granted. Not even things like global variables or TLS. Sometimes even heap buffers show up that actually are addressed ROM memory.

Hence, if you're looking for a "lowest common denominator", you might also want to think about what language constructs are available in your target environment (the compiler is likely to accept anything within standard C, but if something is unsupported the linker will say no).

Having said that, I would always go for alternative 1. Partly because (as others have pointed out), you should never allocate memory for the user directly (an indirect approach is explained further down). Even if the user is guaranteed to work with pure and plain C, they still might for instance use their own customized memory management API for tracking leaks, diagnostic logging etc. Support for strategies like that is commonly appreciated.

Error communication is one of the most important things when dealing with an API. Since the user probably have distinct ways to handle errors in his code, you should be as consistent as possible about this communication throughout the API. The user should be able to wrap error handling towards your API in a consistent way and with minimum code. I would generally always recommend using clear enum codes or defines/typedefs. I personally prefer typedef:ed enums:

typedef enum {

  RESULT_ONE,
  RESULT_TWO

} RESULT;

..because it provides type/assignment safety.

Having a get-last-error function is also nice (requires central storage however), I personally use it solely for providing extra information about an already recognized error.

The verbosity of alternative 1 can be limited by making simple compounds like this:

struct Buffer
{
  unsigned long size;
  char* data;
};

Then your api might look better:

ERROR_CODE func( params... , Buffer* outBuffer );

This strategy also opens up for more elaborate mechanisms. Say for instance you MUST be able to allocate memory for the user (e.g. if you need to resize the buffer), then you can provide an indirect approach to this:

struct Buffer
{
  unsigned long size;
  char* data;
  void* (*allocator_callback)( unsigned long size );
  void  (*free_callback)( void* p );
};

Ofcourse, the style of such constructs is always open for serious debate.

Good Luck!

sharkin
Couldn't you pass that Buffer structure by copy on the stack instead. It's a bit stupid to use dynamic memory for that (I'm assuming you do)
toto
@toto: Note that the Buffer doesn't _have_ to be dynamic memory just because the function takes a pointer. It could very well be a stack instance passed by address. However, I would usually advocate using a Buffer struct like this as a general type, rather than wrapping size and data in a stack-struct every time you need to pass it to a function. I'm assuming that's what you were thinking about speaking about stack-declared structs.
sharkin
Yea sorry, that's just some Leftover Java damage I have. A pointer is fine for an object on the stack.
toto
A: 

I'd do it similarly to the first way, but just subtly different, after the model of snprintf and similar functions:

int func(char* buffer, size_t buffer_size, input params...);

This way, if you have lots of these, they can look similar, and you can use variable numbers of arguments wherever useful.

I agree greatly with the already-stated reasons for using version 1 rather than version 2 - memory problems are much more likely with version 2.

Kim Reece
I don't think he was refering to ellipsis by using the '...'. If for nothing else, ellipsis must always come last in the argument list.
sharkin
I'm sure he wasn't referring to it, but it gave be the idea of rearranging the parameters. =)
Kim Reece
+1  A: 

What about using both methods? I agree with the consensus of answers favoring style 1 vs. the pitfalls of style 2. I do feel style 2 could be used if all your API's follow a consistent naming idiom, like so:


// Style 1 functions
int fooBuff(char* buffer, unsigned int buffer_size, input params... );

// Style 2 functions
char* fooBuffAlloc(input params...);
bool fooBuffFree(char* foo);

/D

thompsongunner