views:

515

answers:

5

The return value of a function is usually stored on the stack or in a register. But for a large structure, it has to be on the stack. How much copying has to happen in a real compiler for this code? Or is it optimized away?

For example:

struct Data {
    unsigned values[256];
};

Data createData() 
{
    Data data;
    // initialize data values...
    return data;
}

(Assuming the function cannot be inlined..)

+10  A: 

None; no copies are done.

The address of the caller's Data return value is actually passed as a hidden argument to the function, and the createData function simply writes into the caller's stack frame.

This is known as the named return value optimisation. Also see the c++ faq on this topic.

commercial-grade C++ compilers implement return-by-value in a way that lets them eliminate the overhead, at least in simple cases

...

When yourCode() calls rbv(), the compiler secretly passes a pointer to the location where rbv() is supposed to construct the "returned" object.

You can demonstrate that this has been done by adding a destructor with a printf to your struct. The destructor should only be called once if this return-by-value optimisation is in operation, otherwise twice.

Also you can check the assembly to see that this happens:

Data createData() 
{
    Data data;
    // initialize data values...
    data.values[5] = 6;
    return data;
}

here's the assembly:

__Z10createDatav:
LFB2:
        pushl   %ebp
LCFI0:
        movl    %esp, %ebp
LCFI1:
        subl    $1032, %esp
LCFI2:
        movl    8(%ebp), %eax
        movl    $6, 20(%eax)
        leave
        ret     $4
LFE2:

Curiously, it allocated enough space on the stack for the data item subl $1032, %esp, but note that it takes the first argument on the stack 8(%ebp) as the base address of the object, and then initialises element 6 of that item. Since we didn't specify any arguments to createData, this is curious until you realise this is the secret hidden pointer to the parent's version of Data.

Alex Brown
"no copying" is overselling this a tad. the struct still have to be copied. even if "createData()" only modified 1 element in the Data struct, the entire struct is still copied back to the caller once - this is easily verified by looking at the assembly generated. (Atlest for non static functions that need to adhere to a platform ABI. More optimizations are available for static functions, or whople program optimizers for the compilers that has those.
Anonym
seriously, there is *no copying*. I have checked the assembly generated.
Alex Brown
I've updated the answer to include assembly language.
Alex Brown
@Anonym: no copying is done because operations on the struct are carried out in the memory allocated by the caller (generally in the caller's stack frame).
Stephen Canon
@all: Let's just say that it depends. Cause it *does depend*. The above simple case is optimized, all is known at compile time. Try fill in all 256 values in a loop, with say data.values[i] = rand(). or some other external function. gcc (4.4.1 atlest with -O2) will memcpy it all back to the caller. gcc will not generate a memcpy for the above simple case thoug. Other compilers might differ.
leeeroy
more info, the question was tagged C. gcc will indeed not generate a copy for struct returns if you compile the code as C++. (But it *will* if you compile it as C if all values of the array are filled in from sources not known at compile time)
leeeroy
@Anonym: to avoid copying requires some optimization, but it is a *very* common optimization. Even lcc does it.
Norman Ramsey
the question was tagged C, but the code is clearly c++. Note the missing struct before each use of the Data type.
Alex Brown
@Alex - Could be forgetfulness, or a `typedef struct Data Data;` that was omitted. Or using a C++ compiler to compile C code.
Chris Lutz
well, in any case, you need a c++ compiler to compile this code, and a c++ compiler does the optimisation. Looks like `gcc` in c mode doesn't.
Alex Brown
+3  A: 
typedef struct {
    unsigned value[256];
} Data;

Data createData(void) {
    Data r;
    calcualte(&r);
    return r;
}

Data d = createData();

msvc(6,8,9) and gcc mingw(3.4.5,4.4.0) will generate code like the following pseudocode

void createData(Data* r) {
      calculate(&r)
}
Data d;
createData(&d);
OwnWaterloo
gcc will use stack space for "Data" inside the createData function, it will not do the entire transformation you've shown, though it will pass inn a "pointer" to the struct location of the caller.
nos
@nos The way of implement functions that return large structures is not specified by Standard, so there is NO correct answer. The compilers I mentioned do generate the code like that : allocate space on the caller frame and pass the address of it to callee.
OwnWaterloo
did he just completely make up the code? or is this what actually happens on those compilers? how does he know?
Luca Matteis
+1  A: 

gcc on linux will issue a memcpy() to copy the struct back on the stack of the caller. If the function has internal linkage, more optimizations become available though.

nos
+3  A: 

There are many examples given, but basically

This question does not have any definite answer. it will depend on the compiler.

C does not specify how large structs are returned from a function.

Here's some tests for one particular compiler, gcc 4.1.2 on x86 RHEL 5.4

gcc trivial case, no copying

[00:05:21 1 ~] $ gcc -O2 -S -c t.c
[00:05:23 1 ~] $ cat t.s
        .file   "t.c"
        .text
        .p2align 4,,15
.globl createData
        .type   createData, @function
createData:
        pushl   %ebp
        movl    %esp, %ebp
        movl    8(%ebp), %eax
        movl    $1, 24(%eax)
        popl    %ebp
        ret     $4
        .size   createData, .-createData
        .ident  "GCC: (GNU) 4.1.2 20080704 (Red Hat 4.1.2-46)"
        .section        .note.GNU-stack,"",@progbits

gcc more realistic case , allocate on stack, memcpy to caller

#include <stdlib.h>
struct Data {
    unsigned values[256];
};
struct Data createData()
{
    struct Data data;
    int i;
    for(i = 0; i < 256 ; i++)
        data.values[i] = rand();
    return data;
}

[00:06:08 1 ~] $ gcc -O2 -S -c t.c
[00:06:10 1 ~] $ cat t.s
        .file   "t.c"
        .text
        .p2align 4,,15
.globl createData
        .type   createData, @function
createData:
        pushl   %ebp
        movl    %esp, %ebp
        pushl   %edi
        pushl   %esi
        pushl   %ebx
        movl    $1, %ebx
        subl    $1036, %esp
        movl    8(%ebp), %edi
        leal    -1036(%ebp), %esi
        .p2align 4,,7
.L2:
        call    rand
        movl    %eax, -4(%esi,%ebx,4)
        addl    $1, %ebx
        cmpl    $257, %ebx
        jne     .L2
        movl    %esi, 4(%esp)
        movl    %edi, (%esp)
        movl    $1024, 8(%esp)
        call    memcpy
        addl    $1036, %esp
        movl    %edi, %eax
        popl    %ebx
        popl    %esi
        popl    %edi
        popl    %ebp
        ret     $4
        .size   createData, .-createData
        .ident  "GCC: (GNU) 4.1.2 20080704 (Red Hat 4.1.2-46)"
        .section        .note.GNU-stack,"",@progbits

gcc 4.4.2### has grown a lot, and does not copy for the above non-trivial case.

        .file   "t.c"
        .text
        .p2align 4,,15
.globl createData
        .type   createData, @function
createData:
        pushl   %ebp
        movl    %esp, %ebp
        pushl   %edi
        pushl   %esi
        pushl   %ebx
        movl    $1, %ebx
        subl    $1036, %esp
        movl    8(%ebp), %edi
        leal    -1036(%ebp), %esi
        .p2align 4,,7
.L2:
        call    rand
        movl    %eax, -4(%esi,%ebx,4)
        addl    $1, %ebx
        cmpl    $257, %ebx
        jne     .L2
        movl    %esi, 4(%esp)
        movl    %edi, (%esp)
        movl    $1024, 8(%esp)
        call    memcpy
        addl    $1036, %esp
        movl    %edi, %eax
        popl    %ebx
        popl    %esi
        popl    %edi
        popl    %ebp
        ret     $4
        .size   createData, .-createData
        .ident  "GCC: (GNU) 4.1.2 20080704 (Red Hat 4.1.2-46)"
        .section        .note.GNU-stack,"",@progbits

In addition, VS2008 (compiled the above as C) will reserve struct Data on the stack of createData() and do a rep movsd loop to copy it back to the caller in Debug mode, in Release mode it will move the return value of rand() (%eax) directly back to the caller

leeeroy
+2  A: 

But for a large structure, it has to be on the heap stack.

Indeed so! A large structure declared as a local variable is allocated on the stack. Glad to have that cleared up.

As for avoiding copying, as others have noted:

  • Most calling conventions deal with "function returning struct" by passing an additional parameter that points the location in the caller's stack frame in which the struct should be placed. This is definitely a matter for the calling convention and not the language.

  • With this calling convention, it becomes possible for even a relatively simple compiler to notice when a code path is definitely going to return a struct, and for it to fix assignments to that struct's members so that they go directly into the caller's frame and don't have to be copied. The key is for the compiler to notice that all terminating code paths through the function return the same struct variable. If that's the case, the compiler can safely use the space in the caller's frame, eliminating the need for a copy at the point of return.

Norman Ramsey
You are right. That was a typo.
Steve Hanov