tags:

views:

371

answers:

10

I'm writing some functions that manipulate strings in C and return extracts from the string.

What are your thoughts on good styles for returning values from the functions.

Referring to Steve McConnell's Code Complete (section 5.8 in 1993 edition) he suggests I use the following format:

void my_function ( char *p_in_string, char *p_out_string, int *status )

The alternatives I'm considering are:

Return the result of the function (option 2) using:

char* my_function ( char *p_in_string, int *status )

Return the status of the function (option 3) using:

int my_function ( char *p_in_string, char *p_out_string )

In option 2 above I would be returning the address of a local variable from my_function but my calling function would be using the value immediately so I consider this to be OK and assume the memory location has not been reused (correct me on this if I'm wrong).

Is this down to personal style and preference or should I be considering other issues ?

+4  A: 

Option 3 is pretty much the unspoken(?) industry standard. If a IO-based C function that returns an integer, returns a non-zero integer value, it almost always means that the IO operation failed. You might want to refer to this Wikibook's section on return values in C/C++.

The reason that people use 0 for success is because there is only one condition of success. Then if it returns non-zero, you look up somehow what the non-zero value means in terms of errors. Perhaps a 1 means it couldn't allocate memory, 2 means the argument was invalid, 3 means there was some kind of IO error, for instance. Technically, typically you wouldn't return 1, but you'd return XXX_ERR_COULD_NOT_MALLOC or something like that.

Also, never return addresses of local variables. Unless you personally malloced it, there are no guarantees about that variable's address after you return from the function. Read the link for more info.

Mark Rushakoff
Except for string operations, which normally return a char * and typically cannot fail.
anon
I probably wouldn't call `str*` functions IO based. But any operation on network sockets, for instance? If they return non-zero, look out.
Mark Rushakoff
But the question IS about string functions.
anon
You can return address of static local variable, although it makes a function non-reenterable.
qrdl
@Neil: You're right, that's a bit more important in relevance to this question, I suppose. I didn't catch it on the first comment - I'm still waking up :P
Mark Rushakoff
+2  A: 

In option 2 above I would be returning the address of a local variable from my_function but my calling function would be using the value immediately so I consider this to be OK and assume the memory location has not been reused (correct me on this if I'm wrong).

I'm sorry but you're wrong, go with Steve McConnell's method, or the last method (by the way on the first method, "int status" should be "int* status".

You're forgiven for thinking you'd be right, and it could work for the first 99,999 times you run the program, but the 100,000th time is the kicker. In a multi-threaded or even on multi process architecture you can't rely that someone or something hasn't taken that segment of memory and used it before you get to it.

Better to be safe than sorry.

Binary Worrier
A: 

2nd Style:

Don't assume that memory will not be used. There can be threads that may eat up that memory and you are left with nothing but never-ending garbage.

Aamir
A: 

The second option is problematic because you have to get memory for the result string, so you either use a static buffer (which possibly causes several problems) or you allocate memory, which in turn can easily cause memory leaks since the calling function has the responsibility to free it after use, something that is easily forgotten.

There is also option 4,

char* my_function ( char *p_in_string, char* p_out_string )

which simply returns p_out_string for convenience.

ammoQ
I've seen this style a few times before, I guess it's useful so you can include the call within a conditional operation and test its return within the conditional call.
David
David: this style lets you use the result of the function as an argument for another function call, e.g.another_function(my_function(a,b), my_function(c,d), my_function(e,f));instead ofmy_function(a,b);my_function(c,d);my_function(e,f);another_function(b,d,f);
ammoQ
A: 

I prefer option 3. This is so I can do error checking for the function inline, i.e. in if statements. Also, it gives me the scope to add an additional parameter for string length, should that be needed.

int my_function(char *p_in_string, char **p_out_string, int *p_out_string_len)
igkuk7
A: 

Regarding your option 2:
If you return a pointer to a local variable, that has been allocated on the stack, the behavior is undefined.
If you return a pointer some piece of memory you allocated yourself (malloc, calloc, ...), this would be safe (but ugly, as you might forget free()).

I vote for option 3:
It allows you to manage memory outside of my_function(...) and you can also return some status code.

weichsel
+1  A: 

a safer way would be:

int my_function(const char* p_in_string, char* p_out_string, unsigned int max_out_length);

the function would return status, so that it's check-able immediately like in

if( my_function(....) )

and the caller would allocate the memory for the output, because

  1. the caller will have to free it and it's best done at the same level
  2. the caller will know how it handles memory allocation in general, not the function
n-alexander
A: 

I would say option 3 is the best to avoid memory management issues. You can also do error checking using the status integer.

Tom Woolfrey
A: 

There's also a point to consider if your function is time critical. On most architecture, it's faster to use the return value, than to use the reference pointer. I had the case when using the function return value I could avoid memory accesses in an inner loop, but using the parameter pointer, the value was always written out to memory (the compiler doesn't know if the value will be accessed via another pointer somewhere else). With some compiler you can even apply attributes to the return value, that can't be expressed on pointers. With a function like strlen, for instance, some compiler know that between to calls of strlen, if the pointer wasn't changed, that the same value will be returned and thus avoid to recall the function. In Gnu-C you can give the attribute pure or even const to the return value (when appropriate), thing which is impossible with a reference parameter.

tristopia
+1  A: 
  1. void my_function ( char *p_in_string, char *p_out_string, int *status )
  2. char* my_function ( char *p_in_string, int *status )
  3. int my_function ( char *p_in_string, char *p_out_string )

In all cases, the input string should be const, unless my_function is explicitly being given permission to write - for example - temporary terminating zero's or markers into the input string.

The second form is only valid if my_function calls "malloc" or some variant to allocate the buffer. Its not safe in any c/c++ implementation to return pointers to local / stack scoped variables. Of course, when my_function calls malloc itself, there is a question of how the allocated buffer is free'd.

In some cases, the caller is given the responsibility for releasing the buffer - by calling free(), or, to allow different layers to use different allocators, via a my_free_buffer(void*) that you publish. A further frequent pattern is to return a pointer to a static buffer maintained by my_function - with the proviso that the caller should not expect the buffer to remain valid after the next call to my_function.

In all the cases where a pointer to an output buffer is passed in, it should be paired with the size of the buffer.

The form I most prefer is

int my_function(char const* pInput, char* pOutput,int cchOutput);

This returns 0 on failure, or the number of characters copied into pOutput on success with cchOutput being the size of pOutput to prevent my_function overruning the pOutput buffer. If pOutput is NULL, then it returns the number of characters that pOutput needs to be exactly. Including the space for a null terminator of course.

// This is one easy way to call my_function if you know the output is <1024 characters
char szFixed[1024];
int cch1 = my_function(pInput,szFixed,sizeof(szFixed)/sizeof(char));

// Otherwise you can call it like this in two passes to find out how much to alloc
int cch2 = my_function(pInput,NULL,0);
char* pBuf = malloc(cch2);
my_function(pInput,pBuf,cch2);
Chris Becke