ansaurus

Question

What is more efficient? Using pow to square or just multiply it with itself?

Answer 1

+11 A:

x*x or x*x*x will be faster than pow, since pow must deal with the general case, whereas x*x is specific. Also, you can ellide the function call and suchlike.

However, if you find yourself micro-optimizing like this, you need to get a profiler and do some serious profiling. The overwhelming probability is that you would never notice any difference between the two.

DeadMG 2010-05-30 21:20:09

I was thinking the same thing until I decided to test it. I just tested `x*x*x` vs double `std::pow(double base, int exponent)` in a timed loop and cannot see a statistically meaningful performance difference.

Emile Cormier 2010-05-30 23:39:50

Make sure it's not getting optimized away by the compiler.

Wallacoloo 2010-05-30 23:50:13

I'm going through my code again to make sure.

Emile Cormier 2010-05-30 23:56:02

See my answer for the code and results.

Emile Cormier 2010-05-31 00:22:33

@Emile: Check the code generated by the compiler. Optimizers do some tricky (and inobvious) things sometimes. Also check the performance at various optimization levels: -O0, -O1, -O2 and -O3 for example.

JUST MY correct OPINION 2010-05-31 02:07:25

Answer 2

+18 A:

sbi 2010-05-30 21:20:28

It might be the right kind of question. Maybe he is not thinking about his own practical project, but is merely interested in how the langauge/compiler works...

Andreas Rejbrand 2010-05-30 21:26:14

@Andreas: The answer would still be to measure in the concrete environment he's interested in.

sbi 2010-05-30 21:30:02

Stackoverflow should have a button that inserts a standard disclaimer: "I already know that premature optimization is evil, but I'm asking this optimization question for academic purposes or I've already identified that line/block of code as a bottleneck".

Emile Cormier 2010-05-30 22:53:40

I do not think readability is an issue here. Writing x*x versus pow(x,2) seem both quite clear.

KillianDS 2010-05-30 23:05:34

@Emile: Genuine such questions are rather easy to identify. They tend to go "Profiling has shown that `pow(x,2)` seems to a performance problem in my application, would it help if I wrote `x*x` instead?" And, nevertheless, the answer would _still_ be "you need to __measure__ it". (And if it's asked for academic reasons, the answer would _still_ be the same. There might be a processor providing an intrinsic `pow`, after all.)

sbi 2010-05-30 23:08:40

@KillianDS: If readability isn't an issue, then gnol needs to get on with coding and forget about such micro optimizations until it's time for profiling.

sbi 2010-05-30 23:38:58

+1 For measurement. I just tested the performance between the two methods and was surprised to see there was no noticeable difference. So much for academic "reasoning". :-)

Emile Cormier 2010-05-30 23:44:14

Excess use of bold and italics, not easy on the eyes.

stagas 2010-05-31 01:15:33

@Emile: I'd support such a boilerplate button if there was a similar boilerplate button in the comments for "did you test before asking?".

JUST MY correct OPINION 2010-05-31 02:04:43

Answer 3

A:

I do not know C very much, but I can give some general hints.

If x is a variable, then most likely x*x*x is faster than pow(x, 3), beucase the former is very direct and cannot be simplified. The latter might involve a function call, and is also more complex since the power is allowed to vary.

If, on the other hand, you want to compute (a + 2*b + sin(c)) * (a + 2*b + sin(c)) * (a + 2*b + sin(c)), then the pow method is faster, because if you do not use pow, you will need to compute the base three times.

Andreas Rejbrand 2010-05-30 21:22:15

`x = (a + 2*b + sin(c))`?

Matthew Flaschen 2010-05-30 21:25:59

Common subexpression elimination is a common optimization. And if you are paranoid and don't want to rely on the optimizer, you can always introduce a temporary variable.

R Samuel Klatchko 2010-05-30 21:26:09

@Matthew: That is always possible, of course. But sometimes you want to keep your routine as short and clear as possible, and then using a `pow` function might be a good idea. (Well, repeating oneself is never a good idea.)

Andreas Rejbrand 2010-05-30 21:28:57

Answer 4

+1 A:

If the exponent is constant and small, expand it out, minimizing the number of multiplications. (For example, x^4 is not optimally x*x*x*x, but y*y where y=x*x. And x^5 is y*y*x where y=x*x. And so on.) For constant integer exponents, just write out the optimized form already; with small exponents, this is a standard optimization that should be performed whether the code has been profiled or not. The optimized form will be quicker in so large a percentage of cases that it's basically always worth doing.

(If you use Visual C++, std::pow(float,int) performs the optimization I allude to, whereby the sequence of operations is related to the bit pattern of the exponent. I make no guarantee that the compiler will unroll the loop for you, though, so it's still worth doing it by hand.)

[edit] BTW pow has a (un)surprising tendency to crop up on the profiler results. If you don't absolutely need it (i.e., the exponent is large or not a constant), and you're at all concerned about performance, then best to write out the optimal code and wait for the profiler to tell you it's (surprisingly) wasting time before thinking further. (The alternative is to call pow and have the profiler tell you it's (unsurprisingly) wasting time -- you're cutting out this step by doing it intelligently.)

brone 2010-05-30 23:02:09

Answer 5

+5 A:

I tested the performance difference between x*x*... vs pow(x,i) for small i using this code:

#include <cstdlib>
#include <cmath>
#include <boost/date_time/posix_time/posix_time.hpp>

inline boost::posix_time::ptime now()
{
    return boost::posix_time::microsec_clock::local_time();
}

#define TEST(num, expression) \
double test##num(double b, long loops) \
{ \
    double x = 0.0; \
\
    boost::posix_time::ptime startTime = now(); \
    for (long i=0; i<loops; ++i) \
    { \
        x += expression; \
        x += expression; \
        x += expression; \
        x += expression; \
        x += expression; \
        x += expression; \
        x += expression; \
        x += expression; \
        x += expression; \
        x += expression; \
    } \
    boost::posix_time::time_duration elapsed = now() - startTime; \
\
    std::cout << elapsed << " "; \
\
    return x; \
}

TEST(1, b)
TEST(2, b*b)
TEST(3, b*b*b)
TEST(4, b*b*b*b)
TEST(5, b*b*b*b*b)

template <int exponent>
double testpow(double base, long loops)
{
    double x = 0.0;

    boost::posix_time::ptime startTime = now();
    for (long i=0; i<loops; ++i)
    {
        x += std::pow(base, exponent);
        x += std::pow(base, exponent);
        x += std::pow(base, exponent);
        x += std::pow(base, exponent);
        x += std::pow(base, exponent);
        x += std::pow(base, exponent);
        x += std::pow(base, exponent);
        x += std::pow(base, exponent);
        x += std::pow(base, exponent);
        x += std::pow(base, exponent);
    }
    boost::posix_time::time_duration elapsed = now() - startTime;

    std::cout << elapsed << " ";

    return x;
}

int main()
{
    using std::cout;
    long loops = 100000000l;
    double x = 0.0;
    cout << "1 ";
    x += testpow<1>(rand(), loops);
    x += test1(rand(), loops);

    cout << "\n2 ";
    x += testpow<2>(rand(), loops);
    x += test2(rand(), loops);

    cout << "\n3 ";
    x += testpow<3>(rand(), loops);
    x += test3(rand(), loops);

    cout << "\n4 ";
    x += testpow<4>(rand(), loops);
    x += test4(rand(), loops);

    cout << "\n5 ";
    x += testpow<5>(rand(), loops);
    x += test5(rand(), loops);
    cout << "\n" << x << "\n";
}

Results are:

1 00:00:01.126008 00:00:01.128338 
2 00:00:01.125832 00:00:01.127227 
3 00:00:01.125563 00:00:01.126590 
4 00:00:01.126289 00:00:01.126086 
5 00:00:01.126570 00:00:01.125930 
2.45829e+54

Note that I accumulate the result of every pow calculation to make sure the compiler doesn't optimize it away.

If I use the std::pow(double, double) version, and loops = 1000000l, I get:

1 00:00:00.011339 00:00:00.011262 
2 00:00:00.011259 00:00:00.011254 
3 00:00:00.975658 00:00:00.011254 
4 00:00:00.976427 00:00:00.011254 
5 00:00:00.973029 00:00:00.011254 
2.45829e+52

This is on an Intel Core Duo running Ubuntu 9.10 64bit. Compiled using gcc 4.4.1 with -o2 optimization.

So in C, yes x*x*x will be faster than pow(x, 3), because there is no pow(double, int) overload. In C++, it will be the roughly same. (Assuming the methodology in my testing is correct.)

Emile Cormier 2010-05-31 00:13:21

ansaurus

tags:

views:

answers:

What is more efficient? Using pow to square or just multiply it with itself?

related questions