views:

395

answers:

4

I'm implementing some math types and I want to optimize the operators to minimize the amount of memory created, destroyed, and copied. To demonstrate I'll show you part of my Quaternion implementation.

class Quaternion
{
public:
    double w,x,y,z;

    ...

    Quaternion operator+(const Quaternion &other) const;
}

I want to know how the two following implementations differ from eachother. I do have a += implementation that operates in-place to where no memory is created, but some higher level operations utilizing quaternions it's useful to use + and not +=.

__forceinline Quaternion Quaternion::operator+( const Quaternion &other ) const
{
    return Quaternion(w+other.w,x+other.x,y+other.y,z+other.z);
}

and

__forceinline Quaternion Quaternion::operator+( const Quaternion &other ) const
{
    Quaternion q(w+other.w,x+other.x,y+other.y,z+other.z);
    return q;
}

My c++ is completely self-taught so when it comes to some optimizations, I'm unsure what to do because I do not know exactly how the compiler handles these things. Also how do these mechanics translate to non-inline implementations.

Any other criticisms of my code are welcomed.

+5  A: 

Between the two implementations you presented, there really is no difference. Any compiler doing any sort of optimizations whatsoever will optimize your local variable out.

As for the += operator, a slightly more involved discussion about whether or not you want your Quaternions to be immutable objects is probably required... I would always lead towards creating objects like this as immutable objects. (but then again, I'm more of a managed coder as well)

LorenVS
And I'd always lean to making them mutable, on the basis that in C++, user-defined numeric types should (as far as possible) mimic int. In e.g. Java it's a different story, all your variables are references and immutable types are more appropriate.
Steve Jessop
+2  A: 

If these two implementations do not generate exactly the same assembly code when optimization is turned on, you should consider using a different compiler. :) And I don't think it matters whether or not the function is inlined.

By the way, be aware that __forceinline is very non-portable. I would just use plain old standard inline and let the compiler decide.

Dima
Well atm I don't have to worry about portability because I'm the only one dealing with this code. I should probably use a macro though instead, but I'm not too worried about that right now.
Mark
First, writing portable code whenever possible is generally a good habit to have. Second, even if nobody else ever looks at your code, you may want to use it on a different platform or with a different compiler someday. Forcing a function to be inlined, when most compilers do a pretty good job of it on their own doesn't seem like a good reason to use a non-standard feature. Just my 2 cents. :)
Dima
I completely agree with your two cents, I was just too lazy to do that there. I'll change it now if it makes you happy =]
Mark
@Dima: There's also the fact that you may want to look at your code several years in the future, when you aren't using the same compiler and don't quite remember what all the compiler-dependent things are.
David Thornley
the new and different compiler will surely help you remember ;-)
pgast
+9  A: 

Your first example allows the compiler to potentially use somehting called "Return Value Optimization" (RVO).

The second example allows the compiler to potentially use something called "Named Return Value Optimization" (NRVO). These 2 optimizations are clearly closely related.

Some details of Microsoft's implementation of NRVO can be found here:

Note that the article indicates that NRVO support started with VS 2005 (MSVC 8.0). It doesn't specifically say whether the same applies to RVO or not, but I believe that MSVC used RVO optimizations before version 8.0.

This article about Move Constructors by Andrei Alexandrescu has good information about how RVO works (and when and why compilers might not use it).

Including this bit:

you'll be disappointed to hear that each compiler, and often each compiler version, has its own rules for detecting and applying RVO. Some apply RVO only to functions returning unnamed temporaries (the simplest form of RVO). The more sophisticated ones also apply RVO when there's a named result that the function returns (the so-called Named RVO, or NRVO).

In essence, when writing code, you can count on RVO being portably applied to your code depending on how you exactly write the code (under a very fluid definition of "exactly"), the phase of the moon, and the size of your shoes.

The article was written in 2003 and compilers should be much improved by now; hopefully, the phase of the moon is less important to when the compiler might use RVO/NRVO (maybe it's down to day-of-the-week). As noted above it appears that MS didn't implement NRVO until 2005. Maybe that's when someone working on the compiler at Microsoft got a new pair of more comfortable shoes a half-size larger than before.

Your examples are simple enough that I'd expect both to generate equivalent code with more recent compiler versions.

Michael Burr
With current compilers, both snippets have about the same chances of having the optimization in place. Nowadays, what is really hard for compilers if applying RVO when the code has more than one return statement (that is, with different objects) as in that case, it is harder for the compiler to determine which of the different instances should be constructed over the return memory space.
David Rodríguez - dribeas
There are also some other small changes that can help the compiler optimize and will never incur extra cost as using free function versions of the operator+ that take the first argument by value. In this case, the compiler can elide the construction / copy construction if the function is called with a temporary.
David Rodríguez - dribeas
+2  A: 

The current consensus is that you should implement first all your ?= operators that do not create new objects. Depending on whether exception safety is a problem (in your case it probably is not) or a goal the definition of ?= operator can be different. After that you implement operator? as a free function in terms of the ?= operator using pass-by-value semantics.

// thread safety is not a problem
class Q
{
   double w,x,y,z;
public:
   // constructors, other operators, other methods... omitted
   Q& operator+=( Q const & rhs ) {
      w += rhs.w;
      x += rhs.x;
      y += rhs.y;
      z += rhs.z;
      return *this;
   }
};
Q operator+( Q lhs, Q const & rhs ) {
   lhs += rhs;
   return lhs;
}

This has the following advantages:

  • Only one implementation of the logic. If the class changes you only need to reimplement operator?= and operator? will adapt automatically.
  • The free function operator is symmetric with respect to implicit compiler conversions
  • It is the most efficient implementation of operator? you can find with respect to copies

Efficiency of operator?

When you call operator? on two elements, a third object must be created and returned. Using the approach above, the copy is performed in the method call. As it is, the compiler is able to elide the copy when you are passing a temporary object. Note that this should be read as 'the compiler knows that it can elide the copy', not as 'the compiler will elide the copy'. Mileage will vary with different compilers, and even the same compiler can yield different results in different compilation runs (due to different parameters or resources available to the optimizer).

In the following code, a temporary will be created with the sum of a and b, and that temporary must be passed again to operator+ together with c to create a second temporary with the final result:

Q a, b, c;
// initialize values
Q d = a + b + c;

If operator+ has pass by value semantics, the compiler can elide the pass-by-value copy (the compiler knows that the temporary will get destructed right after the second operator+ call, and does not need to create a different copy to pass in)

Even if the operator? could be implemented as a one line function (Q operator+( Q lhs, Q const & rhs ) { return lhs+=rhs; }) in the code, it should not be so. The reason is that the compiler cannot know whether the reference returned by operator?= is in fact a reference to the same object or not. By making the return statement explicitly take the lhs object, the compiler knows that the return copy can be elided.

Symmetry with respect to types

If there is an implicit conversion from type T to type Q, and you have two instances t and q respectively of each type, then you expect (t+q) and (q+t) both to be callable. If you implement operator+ as a member function inside Q, then the compiler will not be able to convert the t object into a temporary Q object and later call (Q(t)+q) as it cannot perform type conversions in the left hand side to call a member function. Thus with a member function implementation t+q will not compile.

Note that this is also true for operators that are not symmetric in arithmetic terms, we are talking about types. If you can substract a T from a Q by promoting the T to a Q, then there is no reason not to be able to substract a Q from a T with another automatic promotion.

David Rodríguez - dribeas