views:

293

answers:

8

I suppose the naive implementation of a + operator for matrices (2D for instance) in C++ would be:

class Matrix {

  Matrix operator+ (const Matrix & other) const {
      Matrix result;
      // fill result with *this.data plus other.data
      return result;
  }
}

so we could use it like

Matrix a;
Matrix b;
Matrix c;

c = a + b;

Right?

But if matrices are big this is not efficient as we are doing one not-necessary copy (return result).

Therefore, If we wan't to be efficient we have to forget the clean call:

c = a + b;

Right?

What would you suggest / prefer ? Thanks.

+12  A: 

The C++ standard gives permission for the compiler to elide the unnecessary copy in this case (it's called the "named return value optimization", usually abbreviated to NRVO). There's a matching "RVO" for when you return a temporary instead of a named variable.

Nearly all reasonably recent C++ compilers implement both NRVO and RVO, so generally speaking you can ignore the fact that this construct wouldn't otherwise be particularly efficient.

Edit: I was, of course, talking about the copy involved in returning the new matrix holding the result of the addition. You probably do want to either pass the input by const reference:

Matrix operator+(Matrix const &other) const { 
    Matrix result;
    // ...
    return result;
}

...or else, pass by value, but return the passed value:

Matrix operator+(Matrix other) const { 
    other += *this;
    return other;
}

Note that this depends on commutativity though (i.e., it's really doing b+a instead of a+b) so while it's fine for addition, it won't work for some other operations.

Jerry Coffin
Dave Abrahams has a great blog on the pass-by-value approach here: http://cpp-next.com/archive/2009/08/want-speed-pass-by-value/
Void
Wouldn't the second code example be rather called operator += ? That matches what the implementation actually does and then it may return a reference, not value.
Laurynas Biveinis
@Laurynas: Why should the 2nd code example be called `operator+=`? `this` is not modified there.
KennyTM
@Laurynas:It uses +=, but it doesn't modify `*this`. Instead what it modifies is the parameter, which is copy of the right operand.
Jerry Coffin
Oh I managed to miss that it is passed by value. Now I see how it works. What I had in mind about += was just a standart += implementation:Matrix }
Laurynas Biveinis
+2  A: 

You can return a value without triggering a copy construction. Its called R-Value references Explained in a bit of detail here http://www.artima.com/cppsource/rvalue.html

TP
rvalue references are a new feature for C++0x, so not all current compilers include them. They're also unnecessary in this case.
Jerry Coffin
A: 

Two ways to solve it.

1) Use references:

Matrix& operator+ (Matrix& other) const {

2) Use shallow copy in copy constructor. Yes, new Matrix object will be created but the real matrix will not be created again

Andrey
Your first version "Matrix", b will be modified. So you should limit yourself to "b + c ;" to have correct results...
paercebal
hmmm. how will it modify b? it will just invoke method operator+ for b, pass reference of c and return reference.
Andrey
Bottom line: in this case, returning a reference won't buy you anything. The sum of the two inputs will (normally) be different from those two inputs. so one way or another, you need to create a new matrix to hold that result.
Jerry Coffin
@Andrey: return a reference to what?
jpalecek
@jpalecek - good question - quite possibly some local variable that was valid while the method was executing, but which was destructed when the method exited.
Steve314
A: 

I would try to construct the Matrix at the return statement:

Matrix Matrix::operator + (const Matrix& M)
{
    return Matrix(
        // 16 parameters defining the 4x4 matrix here
        // e.g. m00 + M.m00, m01 + M.m01, ...
    );
}

That way you won't be using any temporaries.

Cthutu
Generally not feasible for arbitrary dimensions. Also, it could result in bad performance, since the matrix results are first pushed onto the call stack (functions generally can't put 16 parameters in registers when optimizing their calling ABI), then the constructor copies them into the object.
Mike DeSimone
If this is a fixed 4x4 matrix class, then that works because most compilers can apply the return value optimization (RVO) to elide the implicit temporary. If it's a general matrix class with arbitrary numbers of rows and columns, you're going to have to make a temporary. As an earlier answer shows, that temporary may also be optimized away by the named return value optimization (NRVO).
Adrian McCarthy
+1  A: 

As others have pointed out, your implementation isn't as expensive as you think. However, it may make some sense to define additional methods that modify the object in-place for use in critical inner loops.

EDIT - fixed following paragraph

The point here is that even with return value optimisations, you still end up constructing a local variable and then assigning that to the result variable after operator+ exits. And destructing that extra object, of course. There is still an extra object used to temporarily hold the result. It's possible to do reference-counting with copy-on-write, but that adds a dereferencing overhead to each use of a matrix.

These are non-issues for 99% of cases, but once in a while you get a critical case. If you were dealing with large matrices, reference-counting overheads would be insignificant - but for 2D up to 4D there are times when you may care a great deal about those few extra cycles - or more to the point, about not putting the matrix on the heap when you want it on the stack or embedded within some struct/class/array.

That said - in those cases, you probably won't be writing your own matrix code - you'll just use the matrix ops from DirectX or OpenGL or whatever.

Steve314
@Steve314: the whole point of [N]RVO is that you *don't* copy constructing the return value. Instead, it simply uses the place the return will go *as* the "local" object.
Jerry Coffin
@Jerry - I should have said "assignment". In the line "c = a + b", you will get an assignment "c = <d>". The <d> is separate from a, b and c, and is an overhead. That overhead is guaranteed to occur because otherwise evaluating the expression wouldn't involve both operator+ and operator=. You need a temporary there to give the assignment something (which is neither a nor b) to copy from. [N]RVO optimises away the copying from inside operator+ to outside, but an intermediate object is still there and an op that in-place modifies c instead would avoid that overhead.
Steve314
@Jerry - that is the "copy constructor" I was talking about was because I somehow was thinking "Matrix c (a + b);" rather than "c = a + b;". That copy constructor has nothing to do with how operator+ is implemented or optimised - it simply uses the (temporary object) result.
Steve314
A: 
class Matrix { 

  Matrix & operator+=(const Matrix & other) {
      // fill result with *this.data plus other.data 
      // elements += other elements 
      return *this;
  }
  Matrix operator+ (const Matrix & other) {
      Matrix result = *this; 
      return result += other; 
  } 
} 
Alexey Malistov
+1  A: 

If you're really concerned about performance (have you profiled?) you probably shouldn't implement operator+ at all since you can't control if it will result in a non-optimal temporary being created. Just implement operator+= and/or member function add(Matrix& result, const Matrix& in1, const Matrix& in2) and let your clients create the correct temporaries.

If you do want operator+ either of Jerry Coffin's will work fine.

Mark B
+2  A: 

Note that your first naive implementation is very native, as nothing is passed by reference. I'll assume this was a really naive example and that there is no need to remind readers of the benefits of passing by-reference instead of by-value.

Note, too, that I have used the non-member-function operators, instead of the member-functions, but in the end, the results are (almost) the same.

If you want to be sure no necessary copy will be created, you should try a non-operator version:

void add(Matrix & result, const Matrix & lhs, const Matrix & rhs) ;

If you want to do it the operator way (which is my prefered solution), then you should assume operator + will create a temporary. You should then define both operator + and operator += :

Matrix & operator += (Matrix & result, const Matrix & rhs) ;
{
   // add rhs to result, and return result
   return result ;
}

Matrix operator + (const Matrix & lhs, const Matrix & rhs) ;
{
   Matrix result(lhs) ;
   result += rhs ;
   return result ;
}

Now, you could try to "leverage" compiler optimisations and write it as:

Matrix & operator += (Matrix & result, const Matrix & rhs) ;
{
   // add rhs to result, and return result
   return result ;
}

Matrix operator + (Matrix lhs, const Matrix & rhs)
{
   return lhs += rhs ;
}

As proposed by Herb Sutter in C++ Coding Standards, 27. Prefer the canonical forms of arithmetic and assignment operators, p48-49:

A variation is to have operator @ [@ being +, -, whatever] accept its first parameter by value. This way, you arrange for the compiler itself to perform the copy for you implicitely, and this can give the compiler more leeway in applying optimizations.

paercebal
I can't believe I used the buzzword "leverage". I could downmod myself for that... :-D ...
paercebal
A 2x2 matrix is pretty small. The overhead of copying it into the parameters may be smaller than the overhead from call-by-reference. Assuming, of course, that 2D means 2*2 - in my experience, matrices are always 2D but "2D" gets used a lot to mean 2*2, "3D" to mean 3*3 etc.
Steve314
@Steve314: When I worked on physics, or on image processing, the matrices we used were 4x4 2D matrices with real values, which mean that even with an optimized array of double[16], the cost of copying the object is greater than the cost of copying its address. Even with 2x2 2D "short" matrices, we would have a short[4], which is 64 bits, which is again, either greater than an address on 32-bit, or strictly equal to an address on 64-bits... [TO BE CONTINUED]
paercebal
@Steve 314: [CONTINUING] ... So strictly put, by-reference passing is faster than by-value passing. Now, if we take into account the compiler optimization on copy (as expressed by Sutter's quote), we could have surprises. But we should have to measure the difference by real-life tests and/or profiling and/or assembly code reading.
paercebal