views:

206

answers:

1

I'm making a vector/matrix library. (GCC, ARM NEON, iPhone)

typedef struct{ float v[4]; } Vector;
typedef struct{ Vector v[4]; } Matrix;

I passed struct data as pointer to avoid performance degrade from data copying when calling function. So I designed function like this at first:

void makeTranslation(const Vector* factor, Matrix* restrict result);

But, if function is inline, is there any reason to pass values as pointer for performance? Do those variables being copied too? How about register and caches? I tried to redesign function like this:

inline Matrix makeTranslation(const Vector factor) __attribute__ ((always_inline));

How do you think about calling costs of each cases?

  • I added 'const' to 2nd signature to reflect suggestions.
+1  A: 

When the function is inline typically no copying of variables is directly involved with the call. Variables will still be moved around and put on the stack sometimes as a normal part of execution but not as a direct result of the function call. (When you run out of registers, some values may get put on the stack, etc... but only if needed.) So the overhead of the "call" basically disappears when a function is inlined (No more setting up/tearing down the stack frame, no more unconditional jump, no more pushing/poping parameters.)

If you can rely on your always_inline attribute to always inline the function, then you should also not pass the Vector by pointer (if it isn't modified). The reason for this is that passing it by pointer requires the vector's address be taken, which means that the compiler must ensure that it has an address and thus it cannot exist only in CPU registers. This can slow things down if it isn't needed, and when you take the address of something the compiler will always ensure it has an address because the compiler can't be sure the address isn't needed.

Because of the pass-by-pointer, this code will ALWAYS have an instruction to get the object's address, and at least one dereference to get at a member's value. If you pass-by-value then this MAY still happen, but the compiler MAY be able to optimize all of that away.

Don't forget that overuse of inlining can significantly increase the size of the compiler binary code. In certain cases having large code segments (as a result of inline functions) can cause more instruction cache misses with will result in slower performance because the CPU is constantly having to go out to main memory to fetch parts of your program because some of it is too big to fit in the small L1 cache. This may be especially important in embedded processors (like the iPhone) because these processors typically have small caches.

SoapBox
As far as I know this only holds true if you declare the parameter as const (or the compiler is able to detect this behavoiur itself). The example does not do this, and it's an important detail.
KillianDS
@KillianDS: Is there no optimization like this if parameter is not const?
Eonil
That may be true, I'm not sure. Usually the best answer in situations like this is "see what the assembly output of your compiler is in both cases, and go with the one you like better." I probably should have just answered that way and saved myself some trouble :-P
SoapBox
@SoapBox: Thanks for detailed explanations. I'll consider cache size and profiling. However inlining will be only option if I go to pass-by-value design.
Eonil
@SoapBox: Are you sure that the compiler will ALWAYS emit an instruction to get the address? It seems to me that it might be possible for a compiler to prove that it is not really necessary to take the address, assuming that the function is inlined.
Jørgen Fogh
It's difficult to check without some real code, but that's what I think it will do assuming the value could have lived completely in registers. I don't have time to put GCC through its paces right now, and I don't have an ARM cross compiler handy anyways....
SoapBox