please note that if you use the inline keyword, this is only a hint for the compiler. If you turn optimizations off, this might cause the compiler not to inline the function. You should go to Project Settings/C++/Optimization/ and make sure to turn Optimization on. What settings have you used for "Inline Function Expansion"?
it also depends optimizations and compiler settings. also look for your compiler's support for an always inline/force inline declaration. inlining is as fast as a macro.
by default, the keyword is a hint -- force inline/always inline (for the most part) returns the control to the programmer of the original intention of the keyword.
finally, gcc (for example) can be directed to inform you when such a function is not inlined as directed.
Apart from what Philipp mentioned, if your using MSVC, you can use __forceinline
or the gcc __attrib__
equivalent to correct the probelems with inlining.
However, there is another possible problem lurking, using a macro will cause the parameters of the macro to be re-evaluated at each point, so if you call the macro like so:
FastVectorCrossAndAssign(getForward(), up, right);
it will expand to:
right.m_tX = getForward().m_tY * up.m_tZ - getForward().m_tZ * up.m_tY;
right.m_tY = getForward().m_tZ * up.m_tX - getForward().m_tX * up.m_tZ;
right.m_tZ = getForward().m_tX * up.m_tY - getForward().m_tY * up.m_tX;
not want you want when your concerned with speed :) (especially if getForward()
isn't a lightweight function, or does some incrementing each call, if its an inline function, the compiler might fix the amount of calls, provided it isn't volatile
, that still won't fix everything though)
NOTE: After posting this answer, the original question was edited to remove this problem. I'll leave the answer as it is instructive on several levels.
The loops differ in what they do!
if we manually expand the macro, we get:
for (long l=0; l<100000000; l++)
right.m_tX = forward.m_tY * up.m_tZ - forward.m_tZ * up.m_tY;
right.m_tY = forward.m_tZ * up.m_tX - forward.m_tX * up.m_tZ;
right.m_tZ = forward.m_tX * up.m_tY - forward.m_tY * up.m_tX;
Note the absense of curly brackets. So the compiler sees this as:
for (long l=0; l<100000000; l++)
{
right.m_tX = forward.m_tY * up.m_tZ - forward.m_tZ * up.m_tY;
}
right.m_tY = forward.m_tZ * up.m_tX - forward.m_tX * up.m_tZ;
right.m_tZ = forward.m_tX * up.m_tY - forward.m_tY * up.m_tX;
Which makes it obvious why the second loop is so much faster.
Udpate: This is also a good example of why macros are evil :)