views:

92

answers:

5

Suppose you have a function, and you call it a lot of times, every time the function return a big object. I've optimized the problem using a functor that return void, and store the returning value in a public member:

#include <vector>
const int N = 100;

std::vector<double> fun(const std::vector<double> & v, const int n)
{
    std::vector<double> output = v;
    output[n] *= output[n];
    return output;
}

class F
{
public:
    F() : output(N) {};
    std::vector<double> output;
    void operator()(const std::vector<double> & v, const int n)
 {
     output = v;
     output[n] *= n;
  }
};


int main()
{
    std::vector<double> start(N,10.);
    std::vector<double> end(N);
    double a;

    // first solution
    for (unsigned long int i = 0; i != 10000000; ++i)
      a = fun(start, 2)[3];

    // second solution
    F f;
    for (unsigned long int i = 0; i != 10000000; ++i)
    {
     f(start, 2);
     a = f.output[3];
 }
}

Yes, I can use inline or optimize in an other way this problem, but here I want to stress on this problem: with the functor I declare and construct the output variable output only one time, using the function I do that every time it is called. The second solution is two time faster than the first with g++ -O1 or g++ -O2. What do you think about it, is it an ugly optimization?

Edit:

to clarify my aim. I have to evaluate the function >10M times, but I need the output only few random times. It's important that the input is not changed, in fact I declared it as a const reference. In this example the input is always the same, but in real world the input change and it is function of the previous output of the function.

+1  A: 

More common scenario is to create object with reserved large enough size outside the function and pass large object to the function by pointer or by reference. You could reuse this object on several calls to your function. Thus you could reduce continual memory allocation.

Kirill V. Lyadvinsky
it's not very different from my code, you propose to create the object outside the function, I create this object inside the class.
wiso
My solution doesn't require additional class.
Kirill V. Lyadvinsky
The benefit of using a functor for this would be the opportunity to change how it works without effecting clients. With the vector exposed publicly though you're not really getting that benefit so the difference becomes little more than a stylistic issue.
Noah Roberts
yes but to declare a class is not so bad, and in my case I hide the output value inside the class, so in the external scope I don't need to declare a variable
wiso
If in your case additional class is useful and simplifies developing process then you should use it.
Kirill V. Lyadvinsky
+1  A: 

In both cases you are allocating new vector many many times.

What you should do is to pass both input and output objects to your class/function:

void fun(const std::vector<double> & in, const int n, std::vector<double> & out)
{
    out[n] *= in[n];
}

this way you separate your logic from the algorithm. You'll have to create a new std::vector once and pass it to the function as many time as you want. Notice that there's unnecessary no copy/allocation made.

p.s. it's been awhile since I did c++. It may not compile right away.

Zepplock
I wouldn't expect this to be a major improvement actually. The assignment operator will likely use memcpy to make the copy, which could be much faster. It might not do any allocations after the first one if the implementation is smart enough (and it would be a basic optimization).
Noah Roberts
@Zepplock: why do you say "In both cases you are allocating new vector many many times"? It's not true: in the second solution I allocate the output vector only in the constructor of `F`.Your solution is good but it's based on the actual example. The real problem is very different, I don't have to modify only one element of the vector.
wiso
@wiso: you are right, I thought he was creating new "F f" every iteration.
Zepplock
+1  A: 

It's not an ugly optimization. It's actually a fairly decent one.

I would, however, hide output and make an operator[] member to access its members. Why? Because you just might be able to perform a lazy evaluation optimization by moving all the math to that function, thus only doing that math when the client requests that value. Until the user asks for it, why do it if you don't need to?

Edit:

Just checked the standard. Behavior of the assignment operator is based on insert(). Notes for that function state that an allocation occurs if new size exceeds current capacity. Of course this does not seem to explicitly disallow an implementation from reallocating even if otherwise...I'm pretty sure you'll find none that do and I'm sure the standard says something about it somewhere else. Thus you've improved speed by removing allocation calls.

You should still hide the internal vector. You'll have more chance to change implementation if you use encapsulation. You could also return a reference (maybe const) to the vector from the function and retain the original syntax.

Noah Roberts
it's only an example, I don't need this function, I don't need to access to one element. The problem I have to handle is more general, I only need to optimize the return value
wiso
A: 

It seems quite strange(I mean the need for optimization at all) - I think that a decent compiler should perform return value optimization in such cases. Maybe all you need is to enable it.

a1ex07
I have a feeling that's what happens is that because the size of the assigned from vector is never changing that the destination just keeps its current memory block after the first assignment. Then it can just use memcpy since the type is basic and the algorithm would speed up significantly. RVO wouldn't improve it as much since with the other version he's actually creating new vectors that don't have that initial allocation. Certainly implementation defined behavior but not unreasonable to expect.
Noah Roberts
Yeah, that is true for this test example. However, in a real application input data may not be so ideal.
a1ex07
Depends on the algorithm. In some cases this may be a very effective optimization. In cases when they are not the same size every time there's going to be less benefit from it though there may still be some since the vector would tend to grow outward and require less and less allocations as it does.
Noah Roberts
+1  A: 

I played with this a bit, and came up with the code below. I keep thinking there's a better way to do this, but it's escaping me for now.

The key differences:

  • I'm allergic to public member variables, so I made output private, and put getters around it.
  • Having the operator return void isn't necessary for the optimization, so I have it return the value as a const reference so we can preserve return value semantics.
  • I took a stab at generalizing the approach into a templated base class, so you can then define derived classes for a particular return type, and not re-define the plumbing. This assumes the object you want to create takes a one-arg constructor, and the function you want to call takes in one additional argument. I think you'd have to define other templates if this varies.

Enjoy...

#include <vector>

template<typename T, typename ConstructArg, typename FuncArg>
class ReturnT
{
    public:
        ReturnT(ConstructArg arg): output(arg){}
        virtual ~ReturnT() {}
        const T& operator()(const T& in, FuncArg arg) 
        {   
            output = in;
            this->doOp(arg);
            return this->getOutput();
        }
        const T& getOutput() const {return output;}
    protected:
        T& getOutput() {return output;}
    private:
        virtual void doOp(FuncArg arg) = 0;

        T output;
};


class F : public ReturnT<std::vector<double>, std::size_t, const int>
{
    public:
        F(std::size_t size) : ReturnT<std::vector<double>, std::size_t, const int>(size) {}
    private:
        virtual void doOp(const int n)
        {
            this->getOutput()[n] *= n;
        }
};

int main()
{
    const int N = 100;
    std::vector<double> start(N,10.);
    double a;


    // second solution
    F f(N);
    for (unsigned long int i = 0; i != 10000000; ++i)
    {
        a = f(start, 2)[3];
    }
}
JohnMcG