Performance of C++ Operators

views:

324

answers:

+1 Q:

Performance of C++ Operators

Is there any sort of performance difference between the arithmetic operators in c++, or do they all run equally fast? E.g. is "++" faster than "+=1"? What about "+=10000"? Does it make a significant difference if the numbers are floats instead of integers? Does "*" take appreciably longer than "+"?

I tried performing 1 billion each of "++", "+=1", and "+=10000". The strange thing is that the number of clock cycles (according to time.h) is actually counterintuitive. One might expect that if any of them are the fastest, it is "++", followed by "+=1", then "+=10000", but the data shows a slight trend in the opposite direction. The difference is more pronounced on 10 billion operations. This is all for integers.

I am dabbling in scientific computing, so I wanted to test the performance of operators. If any of the operators operated in time that was linear in terms of the inputs, for example.

+3 A:

The best answer is to time it with your compiler.

Richard Pennington 2009-12-13 21:31:08

+10 A:

About your edit, the language says nothing about the architecture it's running on. Your question is platform dependent.

That said, typically all fundamental data-type operations have a one-to-one correspondence to assembly.

x86 for example has an instruction which increments a value by 1, which i++ or i += 1 would translate into. Addition and multiplication also have single instructions.

Hardware-wise, it's fairly obvious that adding or multiplying numbers is at least linear in the number of bits in the numbers. Because the hardware has a constant number of bits, it's O(1).

Floats have their own processing unit, usually, which also has single instructions for operations.

Does it matter?

Why not write the code that does what you need it to do. If you want to add one, use ++. If you want to add a large number, add a large number. If you need floats, use floats. If you need to multiply two numbers, then multiply them.

The compiler will figure out the best way to do what you want, so instead of trying to be tricky, do what you need and let it do the hard work.

After you've written your working code, and you decide it's too slow, profile it and find out why. You'll find it's not silly things like multiplying versus adding, but rather going about the entire (sub-)problem in the wrong way.

Practically, all of the operators you listed will be done in a single CPU instruction anyway, on desktop platforms.

GMan 2009-12-13 21:31:25

OP never states /why/ - just asks the question. It seems a bit like you're assuming the end is to micro-optimize, which may not be the case at all.

Chris 2009-12-13 21:42:21

Actually, I'm a little bit tired of this kind of snobbism, starting with "Does It matter?", "Why noy..". Why not simply tell him if you know some page with these comparissons, as he is curious, or shutup otherwise?

Viktor Sehr 2009-12-13 21:49:36

Because it seriously doesn't matter. It's so platform dependent, am I suppose to reach through the screen and do the profiling or disassembly for him?

GMan 2009-12-13 21:53:09

It's pragmatic realism, not snobbishness. The difference between 'x = x +1', 'x++', and 'x += 1' is so insignificant in the great scheme of things as to be irrelevant. Memory access, I/O, and asymptotic complexity of the algorithms used is going to impact perf far more than the insignificant difference we're talking about here. And, if perf somehow *is* an issue, there should be a profiler set up, which would let you know exactly which is faster overall. Besides which, the compiler is free to do what it wants to do to compile any of those, so it's a compiler issue, not language.

kyoryu 2009-12-13 22:15:31

Of course it doesn't matter whether you do `++i` or `i+= 1`. We know that. But the OP doesn't. That's why he asked. I know that the only reason I'm able to ignore such performance questions when programming is because I already have the requisite knowledge to determine that "this doesn't really matter". So yes, for the OP it does matter. He needs some answers so that he can stop worrying.

jalf 2009-12-14 14:21:52

+3 A:

No, no, yes*, yes*, respectively.

* but do you really care?

EDIT: to give some kind of idea with a modern processor, you may be able to do 200 integer additions in the time it takes to make one memory access, and only 50 integer multiplications. If you think about it, you're still going to be bound by the memory accesses most of the time.

Pascal Cuoq 2009-12-13 21:32:20

Depends on architecture, the built in operators for integer arithmetic translate directly to assembly (as I understand it) ++, +=1, and += 10000 are probably equally fast, multiplication would depend on the platform, overloaded operators would depend on you

RyanD 2009-12-13 21:33:01

+4 A:

What you are asking is: What basic operations get transformed into which assembly instructions and what is the performance of those instructions on my specific architecture. And this is also your answer: The code they get translated to is dependant on your compiler and it's knowledge of your architecture, their performance depends on your architecture.

Mind you: in C++ operators can be overloaded for user defined types. They can behave differently from built-in types and the implementation of the overload can be non-trivial (no just one instruction).

Edit: A hint for testing. Most compilers support outputting the generated assembly code. The option for gcc is -S. If you use some other compiler have a look at their documentation.

pmr 2009-12-13 21:37:42

The performance problems caused by C++ operators do not come from the operators and not from the operators implementation. It comes from the syntax, from hidden code being run without you knowing.

The best example, is implementing quick sort, on an object which has the operator[] implemented, but internally it's using a linked list. Now instead of O(nlogn) [1] you will get O(n^2logn) [2].

The problem with performance is that you cannot know exactly what your code will eventually be.

[1] I know that quick sort is actually O(n^2), but it rarely gets to it, the average distribution will give you O(nlogn).

[2] ... god... I hope I got my math correctly, so you guys don't down vote me too much

elcuco 2009-12-13 21:41:05

... and what is an average distribution? wise guy..

elcuco 2009-12-13 21:41:52

+1 A:

Donald Knuth : "We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil"

unless you are writing extremely time critical software, you should probably worry about other things

Alon 2009-12-13 21:41:14

Donald Knuth is the root of all evil! Though it's not exactly his fault, everyone uses this slogan as an excuse for not understanding the nature of things.

Michael Krelin - hacker 2009-12-13 21:47:16

Well, one problem with digging too much into "the nature of things" is that having done that, you will only understand what your version of your compiler is doing on your computer. Any optimizations you make there may actually be counterproductive under other compilers or hardware. So unless you can predict exactly the circumstances and environment under which the code will be running, it's often better to leave the low level details to the compiler, and trust that different compilers will handle them in reasonably good ways based on knowledge that they have available to them at that time.

Jeremy Friesner 2009-12-13 22:19:21

Knuth has forgotten more than you'll ever know. Me too, for that matter. The bigger point is that too often people will worry about micro-optimizing an ineffecient solution, rather than refactoring it to a more efficient solution. Optimized bogosort is still bogosort.

kyoryu 2009-12-13 22:22:41

(that is, Knuth has forgotten more than I'll ever know)

kyoryu 2009-12-13 22:23:13

Jeremy, actually, digging into the nature of thing will give you understanding what the computer does. The compiler is not so important. But I am yet to see someone deriding microoptimizing do the macrooptimizing. Neither is a substitute for the other. But this quote is nothing but an excuse.

Michael Krelin - hacker 2009-12-13 22:29:17

There is plenty of reason to *understand* why how the CPU executes code, and what it means for performance. "Understanding" is not "premature optimization". "premature optimization" is, by definition, when you try to actually *modify* your program, not just understand its behavior, and when you do so before you have the data necessary to to make beneficial modifications.

jalf 2009-12-14 12:52:33

before you can do *any* optimization, premature or otherwise, you have to have a basic idea of the performance implications of various code primitives.

jalf 2009-12-14 13:02:03

jalf, +2 for each comment ;-)

Michael Krelin - hacker 2009-12-14 19:57:47

Short answer: you should turn optimizations on before measuring.

The long answer: If you turned optimizations on, you're performing the operations on integers, and still you get different times for ++i; and i += 1;, then it's probably time to get a better compiler -- the two statements have exactly the same semantics and a competent compiler should translate them into the same instruction sequence.

avakar 2009-12-13 21:57:04

+1 A:

Here is a twist on your evaluations: try Loop Unrolling. Loop unrolling is where you repeat the same statements in a loop to reduce the number of iterations in the loop.

Most modern processors hate branch instructions. The processors have a queue of pre-fetched instructions, which speeds up processing. They really hate branch instructions, because the processor has to clear out the queue and reload it after a branch. This takes more time than just processing sequential instructions.

When coding for processing time, try to minimize the number of branches, which can occur in loop constructs and decision constructs.

Thomas Matthews 2009-12-14 04:18:16

"Does it make a significant difference if the numbers are floats instead of integers?"

-It depends on what kind of processor you are running on. Integer operations are faster on current x86 compatible CPUs.

About i++ and i+=1: there shouldn't be a difference with any good compiler, while you may expect i+=10000 to be slightly slower on x86 CPUs.

"Does "*" take appreciably longer than "+"?"

-Typically yes.

Note that you may run into all sorts of bottlenecks, in which case the speed difference between the operations doesn't show up. Eg. memory bandwidth, CPU pipeline stall due to data dependencies, etc...

Cornelius Scarabeus 2009-12-14 09:02:53

+2 A:

Look up the optimization manuals for your CPU. That's the only place you're going to find answers.

Get your compiler to output the generated assembly. Download the manuals for your CPU. Look up the instructions used by the compiler in the manual, and you know how they perform.

Of course, this presumes that you already know the basics of how a pipelined, superscalar out-of-order CPU operates, what branch prediction, instruction and data cache and everything else means. Do your homework.

Performance is a ridiculously complicated subject. Depending on context, floating-point code may be as fast as (or faster than) integer code, or it may be four times slower. Usually branches carry almost no penalty, but in special cases, they can be crippling. Sometimes, recomputing data is more efficient than caching it, and sometimes not.

Understand your programming language. Understand your compiler. Understand your CPU. And then examine exactly what the compiler is doing in your case, by profiling/timing, and on when necessary by examining the individual instructions. (and when timing your code, be aware of all the caveats and gotchas that can invalidate your benchmarks: Make sure optimizations are enabled, but also that the code you're trying to measure isn't optimized away. Take the cache into account (if the data is already in the CPU cache, it'll run much faster. If it has to read from physical memory to begin with, it'll take extra time. Both can invalidate your measurements if you're not careful. Keep in mind what you want to measure exactly)

For your specific examples, why should ++i be faster than i += 1? They do the exact same thing? Sometimes, it may make a difference whether you're adding a constant or a variable, but in this case, you're adding the constant one in both cases.

And in general, instructions take a fixed constant time regardless of their operands. adding one to something takes just as long as adding -2000 or 1772051912. The same goes for multiplication or division.

But if you care about performance, you need to understand how the entire technology stack works, not just rely on a few simple rules of thumb like "integer is faster than floating point, and ++ is faster than +=" (Apart from anything else, such simple rules of thumb are almost never true, at least not in every case)

jalf 2009-12-14 13:00:05

ansaurus

tags:

views:

answers:

Performance of C++ Operators

related questions