Why is returning a std::pair
or boost::tuple
so much less efficient than returning by reference? In real codes that I've tested, setting data by non-const reference rather than by std::pair
in an inner kernel can speed up the code by 20%.
As an experiment, I looked at three simplest-case scenarios involving adding two (predefined) integers to two integers:
Use an inner, inlined function to modify the integers by reference
Use two inner, inlined function to return ints by value
Use an inner, inlined function to return a std::pair which are copied to the result.
Compiling with g++ -c $x -Wall -Wextra -O2 -S
results in the same assembly code for passing by reference and returning ints by value:
__Z7getPairiRiS_:
LFB19:
pushq %rbp
LCFI0:
leal 1023(%rdi), %eax
addl $31, %edi
movl %eax, (%rsi)
movq %rsp, %rbp
LCFI1:
movl %edi, (%rdx)
leave
ret
(Pass by reference code:
#include <utility>
inline void myGetPair(const int inp, int& a, int& b) {
a = 1023 + inp;
b = 31 + inp;
}
void getPair(const int inp, int& a, int& b) {
myGetPair(inp, a, b);
}
Using individual rvalues:
#include <utility>
inline int myGetPair1(int inp) {
return 1023 + inp;
}
inline int myGetPair2(int inp) {
return 31 + inp;
}
void getPair(const int inp, int& a, int& b) {
a = myGetPair1(inp);
b = myGetPair2(inp);
}
)
Using std::pair, however, adds five extra assembly statements:
__Z7getPairiRiS_:
LFB18:
leal 31(%rdi), %eax
addl $1023, %edi
pushq %rbp
LCFI0:
salq $32, %rax
movq %rsp, %rbp
LCFI1:
orq %rdi, %rax
movq %rax, %rcx
movl %eax, (%rsi)
shrq $32, %rcx
movl %ecx, (%rdx)
leave
ret
The code for that is nearly as simple as the previous examples:
#include <utility>
inline std::pair<int,int> myGetPair(int inp) {
return std::make_pair(1023 + inp, 31 + inp);
}
void getPair(const int inp, int& a, int& b) {
std::pair<int,int> result = myGetPair(inp);
a = result.first;
b = result.second;
}
Can anyone who knows the inner workings of compilers help with this question? The boost tuple page makes reference to a performance penalty for tuples vs. pass-by-reference, but none of the linked papers answer the question.
The reason I'd prefer std::pair to these pass-by-reference statements is that it makes the intent of the function much clearer in many circumstances, especially when other parameters are input as well as the ones that are to be modified.