A: 

please note that if you use the inline keyword, this is only a hint for the compiler. If you turn optimizations off, this might cause the compiler not to inline the function. You should go to Project Settings/C++/Optimization/ and make sure to turn Optimization on. What settings have you used for "Inline Function Expansion"?

Philipp
Turning on Full Optimization, both my functions are returning a time of 0, so I suspect the entire loops are being optimized out because they don't do anything useful. I'll have to play around with this some more.
you might access the results (for example add all the results) and later output the final sum or something like that.
Philipp
A: 

it also depends optimizations and compiler settings. also look for your compiler's support for an always inline/force inline declaration. inlining is as fast as a macro.

by default, the keyword is a hint -- force inline/always inline (for the most part) returns the control to the programmer of the original intention of the keyword.

finally, gcc (for example) can be directed to inform you when such a function is not inlined as directed.

Justin
A: 

Apart from what Philipp mentioned, if your using MSVC, you can use __forceinline or the gcc __attrib__ equivalent to correct the probelems with inlining.

However, there is another possible problem lurking, using a macro will cause the parameters of the macro to be re-evaluated at each point, so if you call the macro like so:

FastVectorCrossAndAssign(getForward(), up, right);

it will expand to:

right.m_tX = getForward().m_tY * up.m_tZ - getForward().m_tZ * up.m_tY; 
right.m_tY = getForward().m_tZ * up.m_tX - getForward().m_tX * up.m_tZ; 
right.m_tZ = getForward().m_tX * up.m_tY - getForward().m_tY * up.m_tX; 

not want you want when your concerned with speed :) (especially if getForward() isn't a lightweight function, or does some incrementing each call, if its an inline function, the compiler might fix the amount of calls, provided it isn't volatile, that still won't fix everything though)

Necrolis
+7  A: 

NOTE: After posting this answer, the original question was edited to remove this problem. I'll leave the answer as it is instructive on several levels.

The loops differ in what they do!

if we manually expand the macro, we get:

for (long l=0; l<100000000; l++) 
    right.m_tX = forward.m_tY * up.m_tZ - forward.m_tZ * up.m_tY;
    right.m_tY = forward.m_tZ * up.m_tX - forward.m_tX * up.m_tZ;
    right.m_tZ = forward.m_tX * up.m_tY - forward.m_tY * up.m_tX;

Note the absense of curly brackets. So the compiler sees this as:

for (long l=0; l<100000000; l++)
{
    right.m_tX = forward.m_tY * up.m_tZ - forward.m_tZ * up.m_tY;
}
right.m_tY = forward.m_tZ * up.m_tX - forward.m_tX * up.m_tZ;
right.m_tZ = forward.m_tX * up.m_tY - forward.m_tY * up.m_tX;

Which makes it obvious why the second loop is so much faster.

Udpate: This is also a good example of why macros are evil :)

Sjoerd
Oh, **thank you** for giving this **perfect** example why one should **always** use braces, even for one-liner bodies. Absolutely +1, eagle eye!
DevSolar
I wouldn't say that macros are evil *per se*, though. They bite you when you're careless (like, not wrapping a multiple-line macro in `do { ... } while (0)`).
DevSolar
@DevSolar: why wrapping a macro in `do { ... } while(0)` when `{ ... }` works perfectly ? Is it this important to force the user to put a semi colon after it ?
Matthieu M.
@Matthieu M.: Yes it is. 1) Omitting the semicolon results in compiler error, forcing the macro call to mimick a proper function call. (Makes it easier when you want to change the macro into a function later on.) But more important, 2) try using your `{ ... }` macro in the `if` part of a `if ... else`. Suddenly you *must not* put the semicolon... Also see http://c-faq.com/cpp/multistmt.html
DevSolar
Submitter response: Ack, in optimizing my posted code for readability I removed what seemed like extraneous braces. In my real code, the braces are there, and the loop does do exactly what you would expect it should do. I've updated the sample. So unfortunately, this is not the answer.
So the posted code is not the real code? How do you expect us to find problems with non-posted code?
Sjoerd