tags:

views:

554

answers:

13

What is the quickest way to find the maximum of two floats:

a)

y = std::max(x1, x2);

b)

if (x1 > x2)
    y = x1;
else
    y = x2;

c)

y = x1 > x2 ? x1 : x2;

Thanks

+2  A: 

B and C would compile the same at least in theory. I pick those because unless std::max is not a function call (e.g. a macro), those would be the fastest.

Edit Apparently, std::max is a template function call like your form C. http://www.cplusplus.com/reference/algorithm/max/

Daniel A. White
std::max can't be a macro, but I'd expect it to be inlined unless there's anything obstructing that (like compiler options).
Steve Jessop
+25  A: 

Here's a different question

Why do you think such a small optimization will matter in the greater context of your program?

I find it highly unlikely that a micro-optimization such as this will have an perceivable impact on your program. You should never micro-optimize like this unless a profiler has specifically shown this to be a problem.

EDIT Adding some clarification buried in the comments

The reason there is no great answer to this question is that the performance of that code is highly dependent upon ...

  1. The way in which it's used in your program
  2. The particular compiler you're using
  3. The optimization flags passed to your compiler
  4. The particular architecture you are running the code on
  5. Many other very tiny things that weren't included in the question

Even if all of this information was included, our answers would be guesses at best. The only way to answer this question is to whip out a profiler and find out which is faster.

However this is almost certainly not worth the effort. Micro-optimizing such a small piece of your program will almost certainly not add any noticable preformance benefits to your code. In general it's a really bad idea to optimize code like this unless the profiler specifically tells you it's a problem. Otherwise you'll spend a lot of time optimizing something for no percievable benefit.

Yes there are cases where such an optimization could be important. But that would be only in a very special circumstances where the code was a part of a very tight highly called loop. However the only way to identify such code is to use a profiler.

JaredPar
as Donald Knuth said: "We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil."
Emile Vrijdags
Maybe this is a uservoice item, but it seems that there's a lot of clutter on SO in that the top N answers to nearly every optimization question are variants of this. Perhaps it would be good to either humor people and just answer their question, or automatically point them to a page with this type of advice on it and ask "are you *sure* you want to ask this?", or something similar.For the record, I think your advice is great and not heeded often enough, but it seems nearly impossible to get useful answers to such questions when there is a valid reason for it, or the asker is just curious.
Matt J
Note: the above comment is food for discussion only; no offense intended. As I mentioned earlier, this type of answer is ubiquitous.
Matt J
@MattJ, I agree that a lot of answers like mine make it to the top but they do so for a reason. They're correct. You shouldn't say the wrong answer simply because the right answer has been said too many times. This is possibly the first time the OP has asked this question and or read this type of answer. Also as Eric said a day or two ago, there is no way to answer this question. The performance of such a micro-optimization will be highly tied to both the compiler, program and the particular architecture the OP is using.
JaredPar
@MattJ, no offense taken on the comment. This is a place for discussion :)
JaredPar
The question was direct, the answer should be direct. It doesn't matter if it's a trivial optimization or not. There are programs where this kind of optimizations DO MATTER and a micro optimization such as this could have great impact these specific programs.There's also something everyone should remember: C++ is about absctraction and PERFORMANCE, so if there's a way to do things faster, without compromising the integrity of the program, it sould be taken. Everything adds up in the end, when compiling the release. So I think this is a valid question.
@chila, My answer is direct, the OP is asking the wrong question. I do not think the question is invalid I think it merely shows the OP is looking in the wrong place as my answer indicates. I disagree that C++ is all about performance. C++ is about flexibility and only paying for what you use. The side effect is that C++ can be an extremely performant language. And if this does have an effect on the final program the profiler will show that effect. That's why you should **always** profile and base your optimizations off of that, not items like "which way of max is faster".
JaredPar
You made another question and then you made two statements that did not answer the question.
Ooh, and I already saw a Joda Time answer today. I've almost got a bingo!
Michael Myers
Of course if this comparison is done a few hundred times per second, it does not even worth bothering about it.However I once worked on a real-time digital signal processing unit where every small optimizations like (possibly) this one saved a few tenth of percent. At the end of the optimization phase, the maximum number of simultaneous channels was about 40% greater than the original version.In such cases, you definitely want to hunt down every detail that could be 2 to 10 times faster with the best implementation. (And use SIMD instructions ;-) )
Jem
@chad, @jem: The problem is this question can not be answered in the general sense of the term. If the question had been what is the fastest way when using processor X, OS Y, Compiler Z at optimization level A then maybe an answer would have been meaningful. In all other situations the answer is not meaningful as there are just too many correct answers all of which are wrong in different situations.
Martin York
@Martin, any answer for a subset such as "x86, Windows, MSVC++ 9: the ternary operator results in assembly with fewer clocks than the if / else" would be a more helpful answer than "it is unlikely that it will make a difference, and if in fact you happen to have the rare case where it would make a difference, then it sucks to be you."
Thomas L Holaday
Jared, chila is right. You a giving a non-answer to a whole class of problems - it could be applied to thousands of questions here, without changing a single word. You are solving the general case, but without asking whether it applies to the original question.
peterchen
@peterchen, the OP's question is simply not answerable with a direct answer. I've listed the criteria needed to provide an acceptable answer and the dangers of such an answer.
JaredPar
@Jared, the OP's question can be answered by a compiler implementer, for that compiler. Someone who is knowledgeable of how the EDG front end represents the different syntaxes in EDG intermediate language would be in a position to state with 100% certainty whether there is any difference and thus be in a position to provide an answer applicable to the many compilers that use EDG as the front end.
Thomas L Holaday
"In general it's a really bad idea to optimize code like this unless the profiler specifically tells you it's a problem. "The question is of couse for the case WHEN IT DOES!!!
@chila the OP did not mention a profiler or provide any details as to why this was a problem in a profile. Without that information it's not possible to tell why it was an issue so I do not agree with your statement. Also SHOUTING does not make me read your comments any closer.
JaredPar
+4  A: 

It is compiler specific, but I would suspect that the result is the same. Have a look at the assembly output of the compiler.

stefaanv
+2  A: 

Same observation as usual, when it comes to "quickest". Did you measure the execution time of you maximum calculation, reported to the rest of execution time of the process? Does this "optimization" decision have a significant impact on the execution time of your application?

I'm 99% sure that the difference between your proposals doesn't worth considering.

Cătălin Pitiș
+4  A: 

You can check it yourself on your system.

I did it for you on gcc - redhat. The results on my system, for 100,000 executions with x1 = 432943.5 and x2 = 434232.9

a) ~1200 uSec b) ~600 uSec c) ~600 uSec

EDIT: With -O2 optimization I got the same results in all 3 cases: ~110 uSec.

Of course, the actual result depends on many factors in your specific problem and system.

Igor Oks
You don't ever want to time a single statement like that. There's too much variation in system state to get accurate timings on a single statement. You want to time a loop of a large number of iterations.
17 of 26
I've rerun it, for 100,000 executions
Igor Oks
Now add cache-miss situations (that'll be more realistic) and you'll have quite different numbers.
n0rd
what flags did you compile with? I would guess that a is slower there because std::max didn't get inlined.
Evan Teran
No special flags. Basically, all is default
Igor Oks
How about you turn on optimizations like -O2 and try it, a much better test since it'll allow the compiler to inline things.
Evan Teran
if you compile without any flags, then the measurement is for naught. GCC is in really-dumb mode then, not caring about anything regarding performance. Try -O2
Johannes Schaub - litb
Recalculated with -O2. Got same results (110 uSec) in all 3 cases.
Igor Oks
+2  A: 

The only way to really know is to measure them. They might vary from compiler to compiler or platform to platform.

Run a loop of each one for 100,000 or 500,000 iterations and compare the total execution times.

17 of 26
+2  A: 

Benchmark them and find out.

Visage
A: 

It is common to #define either b or c as a macro, and use that throughout your code. Note that this only works in most cases, and will die if you pass in an x or y argument modified by a non-idempotent operator or function call with side effects. For example:

#define MAX(x,y) (((x) < (y)) ? (y) : (x))
...
MAX(i++, ++j); //won't work properly, the ++'s will get executed twice.
MAX(changeKSomehow(k), changeLSomehow(L)); //won't work, the functions will get called twice.

It turns out that std::max, at least for GNU libstdc++, is implemented nearly identically, and uses the inline hint. The compiler should be able to take the hint when appropriate (when the < operator takes few enough instructions not to cause a huge amount of I$ pressure if inlined).

Matt J
The problem with defining them as macros is what happens if you call max(x++, ++y)?
sharth
good catch! added a disclaimer
Matt J
In C++ using macros like this is such bad practice, this is standard in C but C++ has better alternatives. Save macros for what they were intended for.
Martin York
I agree. I posted this in hope that the voting system would work and the macro solution would be ranked appropriately within the spectrum of possible solutions, not that it is the best (or even a good) solution. If you're sure that you want the operation inlined, and you're sure that your compiler is too stupid to do it for you, macros are a viable option, but for the reasons you imply (hard to debug, unintuitive semantics as pointed out by sharth, inability to scope/namespace effectively, etc.) the benefits of inlining outweighing the downsides should be a rare case.
Matt J
+1  A: 

And this way may be better?

y = x1;
if (y < x2)
    y = x2;

Removing the else condition may be better interpreted by the compiler.

Edit1: If benchmarking don't forget to do half the test with x1 greater then x2 and the other half with x2 bigger. Otherwise the results would not reflect real cases.

Edit2: Microoptimazions are somehow usefull if you work in embedded systems with 1 or 2k of memory. And also, its an intersting problem to think why the timings are different in each case.

RMAAlmeida
actually, that's probably worse because 50% of the time there's 2 assignments!
Brian Postow
But depending on compiler, in this case 50% of the time there is no branch taken. It's (obviously) architecture and context-dependent whether 50% of an assignment is worth paying to save 50% of a branch.
Steve Jessop
+4  A: 

-O3 Dual core Macbook pro 2.4ghz

std::max(x1, x2) Time: 4.19488 RMAAx's : 4.19613 if Time: 4.18775 ? Time: 4.18831

std::max(x1, x2) Time: 4.1836 RMAAx's : 4.18274 if Time: 4.18603 ? Time: 4.18857

std::max(x1, x2) Time: 4.18714 RMAAx's : 4.18759 if Time: 4.19752 ? Time: 4.19797

std::max(x1, x2) Time: 4.1926 RMAAx's : 4.19293 if Time: 4.19334 ? Time: 4.19626

std::max(x1, x2) Time: 4.18963 RMAAx's : 4.19628 if Time: 4.19253 ? Time: 4.19107

#include <iostream>

using namespace std;

int main (int argc, char * const argv[]) {

    uint64_t iterations = 10000000000;
    float x1 = 3455.232;
    float x2 = 7456.856;
    float y = 0;

    for (int count = 0; count < 5; ++count)
    {  
     clock_t begin_time = clock();
     for (uint64_t ii = 0; ii < iterations; ++ii)
     {
      y = std::max(x1, x2);
     }

     std::cout << "std::max(x1, x2) Time: " << float( clock () - begin_time ) /  CLOCKS_PER_SEC << endl;


     begin_time = clock();
     for (uint64_t ii = 0; ii < iterations; ++ii)
     {
      y = x1;
      if (y < x2)
       y = x2;
     }

     std::cout << "RMAAx's : " << float( clock () - begin_time ) /  CLOCKS_PER_SEC << endl;


     begin_time = clock();
     for (uint64_t ii = 0; ii < iterations; ++ii)
     {
      if (x1 > x2)
       y = x1;
      else
       y = x2;
     }

     std::cout << "if Time: " << float( clock () - begin_time ) /  CLOCKS_PER_SEC << endl;


     begin_time = clock();
     for (uint64_t ii = 0; ii < iterations; ++ii)
     {
      y = x1 > x2 ? x1 : x2;
     }

     std::cout << "? Time: " << float( clock () - begin_time ) /  CLOCKS_PER_SEC << endl;
    }

    return 0;
}
Ben Reeves
+2  A: 

First, as others have said, profile your code and make absolute sure that this is something worth optimizing. If it is, read on: you can do it without branching. See Down with fcmp: Conditional Moves For Branchless Math for more details.

Adam Rosenfield
+3  A: 

Intel x86 has instructions (FCOMI/FCOMIP/ FUCOMI/FUCOMIP) that provide rapid comparison of floating point values. Your CPU may have such instructions, too. The trick is figuring out what C++ to write in order to maximize the chances of your compiler using those instructions instead of doing something slower but more generic.

The optimistic suggestion is to use std::max(float, float) in the hopes that someone else ignored those who taunt "microbenchmark" and "premature optimization" and has done the research necessary to provide a specialization for std::max(float, float) that will use the your hardware's specialized instructions.

Thomas L Holaday
+1 for phrasing it so well and something for the oh-so-well-read Knuth 'taunt'-ers to ponder on..
rama-jka toti
A: 

In presence of a decent optimizer, they are equivalent.

peterchen