views:

409

answers:

3

Consider the following:

struct Point {double x; double y;};

double complexComputation(const& Point p1, const Point& p2)
{
    // p1 and p2 used frequently in computations
}

Do compilers optimize the pass-by-reference into pass-by-copy to prevent frequent dereferencing? In other words convert complexComputation into this:

double complexComputation(const& Point p1, const Point& p2)
{
    double x1 = p1.x; double x2 = p2.x;
    double y1 = p1.y; double y2 = p2.y;
    // x1, x2, y1, y2 stored in registers and used frequently in computations
}

Since Point is a POD, there can be no side effect by making a copy behind the caller's back, right?

If that's the case, then I can always just pass POD objects by const reference, no matter how small, and not have to worry about the optimal passing semantics. Right?

EDIT: I'm interested in the GCC compiler in particular. I guess I might have to write some test code and look at the ASM.

+2  A: 

I can't speak for every compiler, but the general answer is no. It will not make that optimization.

See GOTW#81 to read about how casting to const in C++ doesn't affect optimization as some might think.

Shmoopty
+4  A: 

Your compiler can absolutely lift Point member variables into registers if needed. This, however, is not the same as the compiler converting the function call itself into pass by value.

You should inspect the generated assembly to see what optimizations are being done.

And FWIW, the general rule I use is to pass all primative types by value and all classes/UDTs (PODs or not) by const reference when I can, and let the compiler sort out the best thing to do. We shouldn't worry ourselves with the details of what the compiler is doing, it is much smarter than us.

Terry Mahaffey
I agree about not worrying unless benchmarking/profiling tells us to. But I was just curious if a compiler can indeed do that type of optimization.
Emile Cormier
+1  A: 

There are 2 issues.

Firstly, the compiler will not convert pass-by-ref to pass-by-value, especially if complexComputation is not static (i.e. can be used by external objects).

The reason is API compatibility. To the CPU, there is no such thing as a "reference". The compiler will convert references to pointers. Parameters are passed on stack or via register, so a code calling complexComputation will likely be called as (assume double is of length 4 for a moment):

str x1, [r7, #0x20]
str y1, [r7, #0x24]
str x2, [r7, #0x50]
str y2, [r7, #0x54]
push r7, #0x20     ; push address of p1 onto the stack
push r7, #0x50     ; push address of p2 onto the stack
call complexComputation

Only 8 bytes are pushed onto the stack.

Pass by copy, on the other hand, will push the whole struct onto the stack, so the assembly code will look like

push x1    ; push a copy of p1.x onto the stack
push y1    ; push a copy of p1.y onto the stack
push x2    ; push a copy of p2.x onto the stack
push y2    ; push a copy of p2.y onto the stack
call complexComputation

Note that this time 16 bytes are pushed onto the stack, and the content are the numbers, not pointers. If the complexComputation changes its parameter passing semantics, the input will become garbage and your program may crash.


On the other hand, the optimization

double complexComputation(const Point& p1, const Point& p2) {
    double x1 = p1.x; double x2 = p2.x;
    double y1 = p1.y; double y2 = p2.y;
    // x1, x2, y1, y2 stored in registers and used frequently in computations
}

can be easily done, since the compiler can recognize what variables are used very often and store them into reserved registers (e.g. r4 ~ r13 in the ARM architecture, and many of the sXX/dXX registers) for faster access.


After all, if you want to know if a compiler has done something, you can always disassemble the resulting objects and compare.

KennyTM