ansaurus

Question

Answer 1

+6 A:

It is a bad idea because caller is responsible to free the index, otherwise you are leaking memory. Alternatively you can use static int and return its address every time - there will be no leaks, but function becomes non-reentrant, which is risky (but acceptable if you bear it in mind).

Much better would be to return pointer to char function finds, or NULL if it is not present. That's the way strchr() works, BTW.

Edited to reflect changes in original post.

qrdl 2009-08-20 06:57:34

That was my intent, it's been corrected.

Imagist 2009-08-20 07:45:00

Any kind of utility function like this that was non-reentrant is an accident waiting to happen. I really don't like the 'static int' suggestion!

Roddy 2009-08-20 09:10:16

+1 for the rest, though!

Roddy 2009-08-20 09:10:54

@Roddy That's why I said it is risky. strerror() (from the top of my mind) isn't reentrant for the very same reason but is still very usable.

qrdl 2009-08-20 09:33:04

Answer 2

+1 A:

The function needs to dereference the parameters, which takes more time than accessing the stack.
The pointers can be uninitialized, causing unexpected results.
There is no standard way to specify which pointer is for input, wich is for output and which is for both (there are extensions, and naming tricks, but it's still a matter).

fortran 2009-08-20 06:59:09

Answer 3

+2 A:

In the specific example, you should use size_t as the return type: this is the data type that adequately represents how large strings can get on any system. I.e. you can't possibly have a string that is longer than a size_t can represent. Then, you can fairly safely use (size_t)-1 as an error indicator: realistically, you also cannot put a string with that size into memory, since you also need some address space for the code you are executing; it becomes a limitation of your API that such long strings would not be supported if they existed.

Your approach not only has the disadvantage using more memory, but also the disadvantage of being slower: the callee needs to malloc, the caller needs to free. Those are fairly expensive operations.

There is one other standard approach relevant here: errno. In case of an error indicator, you don't know what the error is. So in C, rather than using an out parameter, we typically put the error details into a global or thread-local variable.

Martin v. Löwis 2009-08-20 07:00:33

Thanks for your input. I was aware of `size_t` but the code above is just an example I threw together to show what I was talking about, so I didn't consider it carefully. Assuming that you mean the maximum value of `size_t` by `(size_t)`, what if you want to support lengths of strings all the way up to the max? I understand that this isn't often a concern but since the reason I'm using C is to write an interpreter for a higher-level language, it's still a concern.

Imagist 2009-08-20 07:17:55

By (size_t)-1, I meant the value -1 casted to size_t; and yes, that gives you the maximum value of size_t. As I said, you *really* don't need to support strings of that size, since they cannot possibly fit into memory. For example, on a 32-bit machine, (size_t)-1 is one byte short of 4GB; along with the terminating 0, the string would require 4GB. Add to that the malloc header, and it just won't fit into address space (let alone that you need some space for the code, and that operating systems often don't give you the full 4GB).

Martin v. Löwis 2009-08-20 07:35:33

Answer 4

+1 A:

The biggest downside is that it requires findChar()'s callers to free() the returned memory, or create a memory leak. You've reinvented the strchr() wheel poorly.

I also don't see why you're thinking that returning a pointer to unsigned int is such a big step forward. First, you could just return an unsigned int, if all you're after is the ability to return values up to 2^32 on a 32-bit machine instead of 2^31-1. Second, your stated goal is to avoid a problem with large strings. Well, what if you're on a 64-bit machine, where 'int' and 'unsigned int' remain 32 bits? What you really want here is a long, but returning pointers doesn't actually help here.

ELIDED BOGUS CRITICISM

Warren Young 2009-08-20 07:01:08

Imagist 2009-08-20 07:22:02

Sorry, not thinking straight. It's 2am, whaddaya want? :)

Warren Young 2009-08-20 08:10:35

THank you for your input!

Imagist 2009-08-20 13:19:50

Answer 5

+3 A:

Without the malloc, the position can be still a stack variable and you can use it in an if statement:

int findChar(char ch, char* in, int* pos)
{
    int i = 0;
    for(i = 0; in[i] != '\0'; i++) 
    {
        if(in[i] == ch) 
        {
            *pos = i;
            return 1;
        }
    }

    return 0;
}

Tom 2009-08-20 07:02:33

This is findchar is a much, much better API than "int findChar(char ch, char* in)". While not uncommon to mix value and error indication in the return value, it is bad practice since it is a violation of KISS.

hlovdal 2009-08-20 07:47:45

See http://lcsd05.cs.tamu.edu/slides/keynote.pdf and http://video.google.com/videoplay?docid=-3733345136856180693 for how to write good APIs. Read about realloc in 'Code Complete' by Steve McConnell for an example of an horrible API.

hlovdal 2009-08-20 07:54:56

Answer 6

+1 A:

I am not an expert, but I think a ton of small mallocs can cause problems. First, you have to take care of freeing the memory after the use of the value. Then you also have to deal with the fragmentation of the free memory. Passing as pointer is more suitable for complex structures.

bandi 2009-08-20 07:04:31

Answer 7

+1 A:

I'd say the most severe downside to your code is that you use one return value to represent both a general failure and the result if successful.

While this is a common practice, it can lead to wierd scenarios when requirements change, just like the one you described. An alternative practice would be to separate the return values, i.e. something like this

int findChar(char ch, char const * const in, unsigned int * const index)
{
    if ( in != NULL && index != NULL)
    {
        unsigned int i;
        for(i = 0; in[i]; i++)
        {
            if(in[i] == ch)
            {
                *index = i;
                return EXIT_SUCCESS;
            }
        }
    }
    return EXIT_FAILURE;
}

...where the function return value tells you whether the function was successful or not, separately from the value of 'index'.

Then again, as fortran noted, there is no way to enforce whether the pointers are input values, output values, or both (i.e. modified inside the function).

Christoffer 2009-08-20 07:39:09

ansaurus

tags:

views:

answers:

Pass-by-reference in C - downsides?

related questions