tags:

views:

4892

answers:

7

Are there any downsides to passing structs by value in C, rather than passing a pointer?

If the struct is large, the there is obviously the performancd aspect of copying lots of data, but for a smaller struct, it should basically be the same as passing several values to a function.

It is maybe even more interesting when used as return values. C only has single return values from functions, but you often need several. So a simple solution is to put them in a struct and return that.

Are there any reasons for or against this?

ADDED

Since it might not be obvious to everyone what I'm talking about here, I'll give a simple example.

If you're programming in C, you'll sooner or later start writing functions that look like this:

void examine_data(const char *ptr, size_t len)
{
    ...
}

char *p = ...;
size_t l = ...;
examine_data(p, l);

This isn't a problem. The only issue is that you have to agree with your coworker in which the order the parameters should be so you use the same convention in all functions.

But what happens when you want to return the same kind of information? You typically get something like this:

char *get_data(size_t *len);
{
    ...
    *len = ...datalen...;
    return ...data...;
}
size_t len;
char *p = get_data(&len);

This works fine, but is much more problematic. A return value is a return value, except that in this implementation it isn't. There is no way to tell from the above that the function get_data isn't allowed to look at what len points to. And there is nothing that makes the compiler check that a value is actually returned through that pointer. So next month, when someone else modifies the code without understanding it properly (because he didn't read the documentation?) it gets broken without anyone noticing, or it starts crashing randomly.

So, the solution I propose is the simple struct

struct blob { char *ptr; size_t len; }

The examples can be rewritten like this:

void examine_data(const struct blob)
{
    ... use blob.tr and blob.len ...
}

struct blob = { .ptr = ..., .len = ... };
examine_data(blob);

struct blob get_data(void);
{
    ...
    return (struct blob){ .ptr = ...data..., .len = ...len... };
}
struct blob data = get_data();

For some reason, I think that most people would instinctively make examine_data take a pointer to a struct blob, but I don't see why. It still gets a pointer and an integer, it's just much clearer that they go together. And in the get_data case it is impossible to mess up in the way I described before, since there is no input value for the lenght, and there must be a returned length.

This became a bit long, maybe I should have put it in some kind of comment, but I'm new here.

+5  A: 

I'd say passing (not-too-large) structs by value, both as parameters and as return values, is a perfectly legitimate technique. One has to take care, of course, that the struct is either a POD type, or the copy semantics are well-specified.

Update: Sorry, I had my C++ thinking cap on. I recall a time when it was not legal in C to return a struct from a function, but this has probably changed since then. I would still say it's valid as long as all the compilers you expect to use support the practice.

Greg Hewgill
Note that my question was about C, not C++.
dkagedal
It's valid to return struct from function just not useful :)
Ilya
I like llya's suggestion to use the return as an error code and parameters for returning data from the function.
zooropa
+3  A: 

I think that your question has summed things up pretty well.

One other advantage of passing structs by value is that memory ownership is explicit. There is no wondering about if the struct is from the heap, and who has the responsibility for freeing it.

Darron
+6  A: 

Simple solution will be return an error code as a return value and everything else as a parameter in the function,
This parameter can be a struct of course but don't see any particular advantage passing this by value, just sent a pointer.
Passing structure by value is dangerous, you need to be very careful what are you passing are, remember there is no copy constructor in C, if one of structure parameters is a pointer the pointer value will be copied it might be very confusing and hard to maintain.

Just to complete the answer (full credit to Roddy ) the stack usage is another reason not pass structure by value, believe me debugging stack overflow is real PITA.

Replay to comment:

Passing struct by pointer meaning that some entity has an ownership on this object and have a full knowledge of what and when should be released. Passing struct by value create a hidden references to the internal data of struct (pointers to another structures etc .. ) at this is hard to maintain (possible but why ?) .

Ilya
But passing a pointer isn't more "dangerous" just because you put it in a struct, so I don't buy it.
dkagedal
Great point on copying a structure that contains a pointer. This point may not be very obvious. For those who don't know what he is referring to, do a search on deep copy vs shallow copy.
zooropa
One of the C function conventions is to have output parameters be listed first before input parameters, e.g. int func(char* out, char *in);
zooropa
+19  A: 

For small structs (eg point, rect) passing by value is perfectly acceptable. But, apart from speed, there is one other reason why you should be careful passing/returning large structs by value: Stack space.

A lot of C programming is for embedded systems, where memory is at a premium, and stack sizes may be measured in KB or even Bytes... If you're passing or returning structs by value, copies of those structs will get placed on the stack, potentially causing the situation that this site is named after...

If I see an application that seems to have excessive stack usage, structs passed by value is one of the things I look for first.

Roddy
Great reference to this site!
zooropa
First circular SO reference I've ever seen...
Chris Kaminski
+3  A: 

One thing people here have forgotten to mention so far (or I overlooked it) is that structs usually have a padding!

struct {
  short a;
  char b;
  short c;
  char d;
}

Every char is 1 byte, every short is 2 bytes. How large is the struct? Nope, it's not 6 bytes. At least not on any more commonly used systems. On most systems it will be 8. The problem is, the alignment is not constant, it's system dependent, so the same struct will have different alignment and different sizes on different systems.

Not only that padding will further eat up your stack, it also adds the insecurity to not be able to predict the padding in advance, unless you know how your system pads and then look at every single struct you have in your app and calculate the size for it. Passing a pointer adds no insecurity. The size of a pointer is known for the system, it is always equal, regardless of what the struct looks like and pointer sizes are always chosen in a way that they are aligned and need no padding.

Mecki
Yea, but the padding exists with no dependency on passing the structure by value or by reference.
Ilya
If you had tested your example, you would have found that your example struct is indeed four bytes, so your argument is moot. But it had no relevance for the question anyway.
dkagedal
@dkagedal: Which part of "different sizes on different systems" didn't you understand? Just because it is that way on your system, you assume it must be the same for any other one - that's exactly why you should not pass by value. Changed sample so it fails on your system as well.
Mecki
@llya: Yeah, but padding consumes stack space for what? Right, nothing. And without knowing the padding for any single struct, there is no way to predict how much stack space a call to you function "costs", while it is absolutely clear when you pass a pointer.
Mecki
I think Mecki's comments about struct padding are relevant especially for embedded systems where stack size may be an issue.
zooropa
I guess the flip side of the argument is that if your struct is a simple struct (containing a couple of primitive types), passing by value will enable the compiler to juggle it using registers -- whereas if you use pointers, things end up in the memory, which is slower. That gets pretty low-level and pretty much depends on your target architecture, if any of these tidbits matter.
kizzx2
Unless your struct is tiny or your CPU has many registers (and Intel CPUs have not), the data ends up on the stack and that is also memory and as fast/slow as any other memory. A pointer on the other hand is always small and just a pointer and the pointer itself will usually always end up in a register when used more often.
Mecki
+4  A: 

One reason not to do this which has not been mentioned is that this can cause an issue where binary compatibility matters.

Depending on the compiler used, structures can be passed via the stack or registers depending on compiler options/implementation

See: http://gcc.gnu.org/onlinedocs/gcc/Code-Gen-Options.html

-fpcc-struct-return

-freg-struct-return

If two compilers disagree, things can blow up. Needless to say the main reasons not to do this are illustrated are stack consumption and performance reasons.

tonylo
This was the kind of answer I was looking for.
dkagedal
Now this was quite obscure, great point!
kizzx2
+2  A: 

To really answer this question, one needs to dig deep into the assembly land:

(The following example uses gcc on x86_64. Anyone is welcome to add other architectures like MSVC, ARM, etc.)

Let's have our example program:

// foo.c

typedef struct
{
    double x, y;
} point;

void give_two_doubles(double * x, double * y)
{
    *x = 1.0;
    *y = 2.0;
}

point give_point()
{
    point a = {1.0, 2.0};
    return a;
}

int main()
{
    return 0;
}

Compile it with full optimizations

gcc -Wall -O3 foo.c -o foo

Look at the assembly:

objdump -d foo | vim -

This is what we get:

0000000000400480 <give_two_doubles>:
    400480: 48 ba 00 00 00 00 00    mov    $0x3ff0000000000000,%rdx
    400487: 00 f0 3f 
    40048a: 48 b8 00 00 00 00 00    mov    $0x4000000000000000,%rax
    400491: 00 00 40 
    400494: 48 89 17                mov    %rdx,(%rdi)
    400497: 48 89 06                mov    %rax,(%rsi)
    40049a: c3                      retq   
    40049b: 0f 1f 44 00 00          nopl   0x0(%rax,%rax,1)

00000000004004a0 <give_point>:
    4004a0: 66 0f 28 05 28 01 00    movapd 0x128(%rip),%xmm0
    4004a7: 00 
    4004a8: 66 0f 29 44 24 e8       movapd %xmm0,-0x18(%rsp)
    4004ae: f2 0f 10 05 12 01 00    movsd  0x112(%rip),%xmm0
    4004b5: 00 
    4004b6: f2 0f 10 4c 24 f0       movsd  -0x10(%rsp),%xmm1
    4004bc: c3                      retq   
    4004bd: 0f 1f 00                nopl   (%rax)

Excluding the nopl pads, give_two_doubles() has 27 bytes while give_point() has 29 bytes. On the other hand, give_point() yields one fewer instruction than give_two_doubles()

What's interesting is that we notice the compiler has been able to optimize mov into the faster SSE2 variants movapd and movsd. Furthermore, give_two_doubles() actually moves data in and out from memory, which makes things slow.

Apparently much of this may not be applicable in embedded environments (which is where the playing field for C is most of the time nowdays). I'm not an assembly wizard so any comments would be welcome!

kizzx2
Counting the number of instructions isn't all that interesting, unless you can show a huge difference, or count more interesting aspects such as the numer of hard-to-predict jumps etc. The actual performance properties is much more subtle than the instruction count.
dkagedal
@dkagedal: True. In retrospect, I think my own answer was written very poorly. Although I didn't focus on number of instructions very much (dunno what gave you that impression :P), the actual point to make was that passing struct by value is preferable to passing by reference for small types. Anyway, passing by value is preferred because it's simpler (no lifetime juggling, no need to worry about someone changing your data or `const` all the time) and I found there's not much performance penalty (if not gain) in the pass-by-value copying, contrary to what many might believe.
kizzx2