ansaurus

Question

c++ operator overloading memory question

Answer 1

A:

Data and Code are orthogonal concepts. What difference does it make to have Code work on an object from the Heap as opposed to one residing on the stack? (provide you are respecting object scope in both cases)

jldupont 2009-09-19 21:25:28

I agree, but it does not answer the question.

Ed Swangren 2009-09-19 21:55:16

Answer 2

+9 A:

If you're talking about for example operator+, where the object returned is not either of those input, then the answer is you instantiate on the stack and return by value:

struct SomeClass {
    int value;
};

SomeClass operator+(const SomeClass &lhs, const SomeClass &rhs) {
    SomeClass retval;
    retval.value = lhs.value + rhs.value;
    return retval;
}

or

class SomeClass {
    int value;
public:
    SomeClass operator+(const SomeClass &rhs) const {
        SomeClass retval;
        retval.value = this->value + rhs.value;
        return retval;
    }
};

or even:

class SomeClass {
    int value;
public:
    SomeClass(int v) : value(v) {}
    friend SomeClass operator+(const SomeClass &lhs, const SomeClass &rhs) {
        return SomeClass(lhs.value + rhs.value);
    }
};

The compiler then worries about where (on the stack) the return value is actually stored.

It will for example apply return-value optimizations if it can, but in principle what's happening is "as-if" the work you do constructs some value on the stack of your operator overload, and then at return this is copied to wherever it needs to be next. If the caller assigns the return value, it's copied there. If the caller passes it by value to some other function, it's copied wherever the calling convention says it needs to be in order to be that function parameter. If the caller takes a const reference, then it's copied to a temporary hidden away on the stack.

Steve Jessop 2009-09-19 21:30:23

Thank you this is exactly what I was wondering about. I was unaware that it simply copied the entire instance over.

Ori Cohen 2009-09-19 21:34:19

Yes, a return by value might not actually copy the object (the compiler is allowed to optimize and replace the variable `retval` with the actual location that the caller needs the return value to be, to save the work of the copy). But other than any side-effects of the assignment itself (for instance if you had tracing in `operator=` you might not see it), the end result is the same as copying.

Steve Jessop 2009-09-19 21:37:42

The first two, initializing `retval` using its default ctor and then immediately overriding that value, seem wrong to me. If you have `+=` (which you very likely should, if you have `+`), you could be more efficient. (See my answer for more details.)

sbi 2009-09-21 08:00:40

It's just a toy example, to illustrate the return. I'm not trying to write the most efficient possible way to add two integers. As it happens, though, in the first two examples the default constructor of SomeClass does absolutely nothing. If that were not the case then yes, it would be pointlessly inefficient to default-construct and then immediately assign.

Steve Jessop 2009-09-21 17:08:52

Answer 3

A:

You are correct that data on the stack is unusable when the function executes. However, it is perfectly okay to return copies of data on the stack (which is what you are doing). Just make sure you don't return pointers to data on the stack.

Zifre 2009-09-19 21:37:55

Answer 4

A:

Using your code:

Point Point::operator+ (Point a)
{
    Point result(this->x+a.x,this->y+ a.y);
    return result;
}

This will work fine.
Basically it creates result localy (on the stack). But the return statement copies the result back to the calling point (just like an int). It uses the Point copy constructor to copy the value back to the call point.

int main()
{
    Point  a(1,2);
    Point  b(2,3);

    Point  c(a + b);
}

Here the operator + creates a local on the stack. This is copied back to the call point (The constructor for c) by the return. Then the copy constructor for c is used to copy the content into c.

But you think that seems a little costly on the copy construction. Technically yes. But the compiler is allowed to optimize away the extra copy constructions (and all modern compilers are very good at it).

Returning to your code.

Point Point::operator+ (Point a)
{
    Point *c = new Point(this->x+a.x,this->y+ a.y);
    return *c;
}

Don't do this. Here you have allocated dynamically but you are copying the result back to the call point (as described above using the copy constructor). So by the time control returns to the call point you have lost the pointer and can't de-allocate the memory (Thus a memory leak).

The difference between Java and C++ is that when we return pointers we use smart pointers to help the caller identify who is responsible for freeing the memory (look up pointer ownership).

Martin York 2009-09-19 21:58:04

Actually, the caller cannot possibly free anything. The Point object is copied before returning, and the pointer to the dynamically allocated copy is lost, so it cannot be deleted. The OP's code results in a memory leak in any case.

Paolo Capriotti 2009-09-19 22:24:28

Re-Worded so its less confusing. The point being you should NOT do it. Which I think I was very clear about.

Martin York 2009-09-20 00:26:03

Answer 5

+2 A:

C++ : RAII and Temporaries

You're right about objects on stack being destroyed once going out of scope.

But you ignore that C++ will use temporary objects are necessary. You must learn when a temporary variable will be created (and then optimized away) by the compiler for your code to work.

Temporary Objects

Note that in the following, I'm describing a very simplified "pure" viewpoint of what's happening: Compilers can and will do optimizations, and among them, will remove useless temporaries... But the behavior remains the same.

Integers?

Let's start slowly: What is supposed to happen when you play with integers:

int a, b, c, d ;
// etc.
a = b + (c * d) ;

The code above could be written as:

int a, b, c, d ;
// etc.
int cd = c * d ;
int bcd = b + cd ;
a = bcd ;

Parameters by value

When you call a function with a parameter passed "by value", the compiler will make a temporary copy of it (calling the copy constructor). And if you return from a function "by value", the compiler will, again, make a temporary copy of it.

Let's imagine an object of type T. The following code:

T foo(T t)
{
   t *= 2 ;

   return t ;
}

void bar()
{
   T t0, t1 ;

   // etc.

   t1 = foor(t0) ;
}

could be written as the following inlined code:

void bar()
{
   T t0, t1 ;

   // etc.

   T tempA(t1)     // INSIDE FOO : foo(t0) ;
   tempA += 2 ;    // INSIDE FOO : t *= 2 ;
   T tempB(tempA)  // INSIDE FOO : return t ;

   t1 = tempB ;    // t1 = foo...
}

So, despite the fact you don't write code, calling or returning from a function will (potentially) add a lot of "invisible code", needed to pass data from one level of the stack to the next/previous.

Again, you need to remember that the C++ compiler will optimize away most temporary, so what could be seen as an innefficient process is just an idea, nothing else.

About your code

Your code will leak: You "new" an object, and don't delete it.

Despite your misgivings, the right code should be more like:

Point Point::operator+ (Point a)
{
   Point c = Point(this->x+a.x,this->y+ a.y) ;
   return c ;
}

Which with the following code:

void bar()
{
    Point x, y, z ;
    // etc.
    x = y + z ;
}

Will produce the following pseudo code:

void bar()
{
    Point x, y, z ;
    // etc.
    Point tempA = z ;  // INSIDE operator + : Point::operator+ (Point a)
    Point c = z ;      // INSIDE operator + : Point c = Point(this->x+a.x,this->y+ a.y) ;
    Point tempB = c ;  // INSIDE operator + : return c ;

    x = tempB ;        // x = y + z ;
}

About your code, version 2

You make too much temporaries. Of course, the compiler will probably remove them, but then, no need to take sloppy habits.

You should at the very least write the code as:

inline Point Point::operator+ (const Point & a)
{
   return Point(this->x+a.x,this->y+ a.y) ;
}

paercebal 2009-09-19 23:25:35

`Point c = Point(this->x+a.x,this->y+ a.y)` seems clumsy to me. What's wrong with `Point c(this->x+a.x,this->y+ a.y)`. (Or did you, for educational purposes, try to stick as close as possible to the OP's code? If so, then disregard this comment.)

sbi 2009-09-21 08:03:17

@sbi : I wanted to detail everything, for educational purposes. The "final" version being at "About your code, version 2" section... ^_^ ...

paercebal 2009-10-03 23:39:44

Answer 6

+1 A:

You've already had a few good answers. Here's a few more points I'd like to add to them:

You should try to avoid copying Point objects. Since they are bigger than built-in types (from your code I assume they consist of two built-ins), copying them is, on most architectures, more expensive than passing them around per reference. That changes your operator to: Point Point::operator+ (Point&) (Note that you have to copy the result, as there's no place it can be stored persistently so you can pass around a reference to it.)
However, to make the compiler check you didn't screw it up and accidentally modified the operator's argument, you pass it per const reference: Point Point::operator+ (const Point&).
Since operator+() (other than, e.g., operator+=()) doesn't change its left argument either, you should make the compiler check that, too. For a binary operator that is a member function, the left argument is what the this pointer points to. To make this a constant in a member function, you have to inject a const at the end of the member function's signature. That makes it: Point Point::operator+ (const Point&) const. Now your operator is what's usually called const-correct.
Usually, when you provide operator+() for your type, people will expect operator+=() to also be present, so usually you should implement both. Since they behave quite similar, to not to be redundant you should implement one on top of the other. The easiest and most efficient (and therefor more or less canonical) way to do this is to implement + on top of +=. That makes operator+() quite easy to write -- and what's even more important: basically it looks the same for every type you implement it for:

Since operator+() became quite trivial, you would probably want to inline it. This would then be the resulting code so far:

 inline Point Point::operator+ (const Point& rhs) const
 {
    Point result(this);
    result += a;
    return result;
 }

These are a few basic syntactic and semantic peculiarities which (hopefully) all reading this will agree to. Now here comes a rule of thumb that I use for my code and which I find very helpful, but which probably not everyone will agree to:

Binary operators that treat both of their arguments equally (which usually means they don't change either of them), should be implemented as free functions, binary operators that treat their left argument (usually: that change it) should be implemented as member functions.

The reason for the latter (take operator+=() as an example) is rather straight-forward: In order to change it, they might need to have access to the left argument's innards. And changing class object's innards is best done through member functions.

The reasons for the former are not as simple. Among other things, Scott Meyers had an excellent article explaining that, contrary to popular belief, using non-member functions often actually increase encapsulation. But then there's also the fact that for the this argument of member functions, some rules (implicit conversions, dynamic dispatch etc.) differ from those for the other arguments. Since you want both arguments to be treated equally, it might be surprising under some circumstances to have different rules apply to the left-hand side.

The code then looks like this:

 inline Point operator+ (const Point& lhs, const Point& rhs) const
 {
    Point result(lhs);
    result += rhs;
    return result;
 }

To me, this is the ultimate canonical form of it which I write down in my code without much thinking about it, no matter what type it is.

Implementing operator+=() is left as an exercise to the reader. :)

sbi 2009-09-20 21:28:10

Re "ultimate canonical form" - actually, if you're going to use += then you should consider taking the first parameter by value, then doing `return lhs += rhs;`. You're going to copy it anyway, and doing it in the parameter supposedly gives you a better chance of optimisations, especially in the case where it happens to be a temporary.

Steve Jessop 2009-09-21 11:08:19

I'm not sure this would give the compiler a better chance at optimizing. This is all very basic and simple _inline_ code, after all. OTOH it changes the operator's _interface_, possibly confusing users - for the sake of a might-be optimization. I'm always wary to do that...

sbi 2009-09-21 11:33:27

It will confuse users if (a) they actually look at the code or signature, and (b) you haven't commented or documented it sufficiently, and (c) they haven't seen the trick before. Actually it's more confusing for code maintainers, since it's easy to miss that the param is by copy, and think there should be *another* copy. I can't recall for what compiler(s) it actually helps to do copies in the call rather than in the callee, but I do remember being convinced at the time that there was such a situation. This is why I only say you should consider it...

Steve Jessop 2009-09-21 12:29:28

As for (a): That signature is the feature's interface. I assume people will look at it, although I agree that many might never look at an operator's signature, presuming it will be a default run-of-the-mill one. Your argument (b) would be a way to deal with that, if I wouldn't disagree with (c) so much. `:)` (It will all be `inline`. No caller, no callee, just all the code nicely in a row waiting for the optimizer to get over-excited over it.)

sbi 2009-09-21 18:31:04

In C++, it is usual (where operators are concerned) to express interfaces in terms of allowable expressions, rather than in terms of a signature. So, OK, the signature might be the interface, but it shouldn't be. I guess you're probably right that enough inlining removes the danger of the compiler missing an opportunity for copy-elision. But if enough people just passed by value as their default (as I think Alexandrescu recommends, and litb just did in another question) it would not be surprising and hence would become optimal. Funny how optimal code depends on who's reading it...

Steve Jessop 2009-09-22 11:19:13

@onebyone: I agree that what seems easy to read depends on what you're used to. (I remember finding code using the STL incredibly hard to read when the STL was new.) Regarding pass-by-copy: I suspect many of these idioms will change and shift a lot once we have some experience with rvalue references.

sbi 2009-09-22 12:06:50

ansaurus

tags:

views:

answers: