views:

211

answers:

7

Hello everyone,

This is in reference to C++ and using the Visual Studio compilers.

Is there any difference in speed when reading/writing (to RAM) and doing mathematical operations on different types of variables such as bool, short int, int, float, and doubles?

From what I understand so far, mathematical operations with doubles takes much longer (I am talking about 32 bit processors, I know little about 64 bit processors) than, say, operations with floats.

How then do operations (reading/writing to ram and elementary math) with float and int compare? How about int and short int, or even differences between signed and unsigned versions of each of the types? Is there any one data type that would be most efficient to work with as low number counters?

Thanks, -Faken

+3  A: 

This heavily depends on the architecture and the compiled assembly of your C++ code. For example on MIPS, floating point operations require reading to and from multiple CPU registers.

This type of micro-optimization shouldn't affect performance very much and should be left to the compiler to handle. You should profile your application for bottlenecks if you're looking to optimize something.

Ben S
But he explicitly said Visual Studio compilers and 32bit, so I think it is safe to assume x86 here. But your point is valid.
jeffamaphone
err, ok. Unfortunately that doesn't answer my question. I want to know, in general, for a standard x86 processor, the relative speeds of the manipulation of the variable types. I'm a new programmer and my programs have loops running mathematical operations numbering in the billions or even trillions of calculations so i do have a sort of legitimate reason for it. Tradeoffs between memory usage, precision, speed, ect/
Faken
The relative speeds are so small that there will not be a noticeable difference even for many repeated operations on primitives. Use the type which suites the variable. The compiler will take care of moving the bits around from the memory to the CPU registers.
Ben S
The easy answer is try it. If you run something a billion times in a loop (unrolled by 10 or 100), and see how many seconds it takes, that equals nanoseconds. Generally floating point is slower than integer types (because it does more). In general, though, the code has to be really aggressively tuned before this is what matters.
Mike Dunlavey
+4  A: 

From what I understand so far, mathematical operations with doubles takes much longer (we are talking about 32 bit processors, I know little about 64 bit processors) than say operations with floats.

I don't know (I've virtually never programmed floating-point arithmethic) but I doubt that: because double is native precision (supported by hardware), whereas I don't know about float.

How then do operations (reading/writing to ram and elementary math) with float and int compare?

Float is slower than int.

how about int and short int, or even differences between signed and unsigned versions of each of the variables?

Short may be slower than int, because the CPU uses int natively and needs to truncate results to make them short. They'd only be faster if there are so many of them contiguously that they're better at fitting in the CPU cache.

differences between signed and unsigned versions of each of the variables?

No I don't think so.

Is there any one data type that would be most efficient to work with as low number counters?

int.

ChrisW
Yeah, when in doubt, use int.
jeffamaphone
Where is this information from? For what platform?
Ben S
"Native" floating-point precision on x86 is actually `long double` (80-bit), since that's the size of FPU registers. In any case, it doesn't affect perf of either `float` or `double`.
Pavel Minaev
Is it worth pointing out that SSE floating-point ops are generally faster with narrower types? (That is, more operations per data with narrower types.)
greyfade
@greyfade - Are floats, as the OP said, "much" faster than doubles? My guess was that they're not very different because they're both using the hardware FPU.
ChrisW
@ChrisW: I mean in the sense that you can perform more operations on more values simultaneously with `float`s in SSE registers than you can with `double`s. Likewise with `short`s vs. `int`s, etc.
greyfade
I think that the number of simultaneous operations depends on the number of CPUs and/or pipelines within the CPU. Using 16-bit instead of 32-bit integer registers does't let you increase the number of concurrent integer operations (except for possibly load and store).
ChrisW
More complex operations like sin and exp, let alone Bessel can be a lot slower for double. The reason is that when they are implemented as iterative approximations, you need more rounds to get more bits.
MSalters
+3  A: 

Most CPUs work fastest on data types that match their natural word size. So, depending on architecture, 4 or 8 byte data types. int is often defined as "natural word sized" so any operation on an int should be fast.

In practice, however, you're going to pay so much more for cache misses, soft and hard page faults, and memory access than any sort of arithimatic operations that optimizing your data types is probably going to be a waste of time.

Profile your code, then optimize the hots spots.

Kevin Montrose
In C++, `int` is _always_ defined as the natural integer type of the underlying platform. Otherwise, especially regarding the peculiarities of caching etc. I wish I had more votes to vote this one up. For the advice to profile first, I'd give you ten votes. `:)`
sbi
Always _defined_, but only _sometimes_ implemeted as such. Most famous counterexample is the x86-64 CPU on many platforms, which will still have 32 bit ints due to the lack of a "short short int".
MSalters
@MSalters: I believe the reason for the abomination of a 32bit `int` on the 64bit platform ist that they didn't want to disappoint the "programmers" who wrote their code assuming `int` would be 32bit. I hate the idea. `:(`
sbi
+1  A: 

I don't know what the difference is between how doubles and floats are processed, but I'm pretty sure its done with a Floating Point Unit. This compares with register operations for ints and longs. (Is there a separate Integer calculation processor in modern CPUs?)

The writing to RAM question is very tricky to answer now-a-days because of the levels of caching and the very high CPU clock speeds compared to RAM write speeds.

As far as performance is concerned - Measure first!

Write performance tests that accurately reflect your target environment.

quamrana
+6  A: 

There are two different questions here: speed when reading/writing, and arithmetic performance. Those are orthogonal. When reading or writing a large array, of course, the speed depends on the amount of bytes read as O(N), so using short over int (considering VC++) would slash the time by ~1/2.

For arithmetic, once the operands are in registers, the size of the type doesn't matter so much. IIRC, between types in the same category, it is actually the same (so short isn't any faster or slower than int). Using 64-bit integer types on a 32-bit platform will have a penalty, naturally, since there's no single instruction to handle that. Floating-point types, on the other hand, are simply slower than all integral types, even though sizeof(float)==sizeof(int) on VC++. But, again, operations on float aren't any faster than operations on double; this is assuming default FPU settings, which promote all operands to 80-bit extended floats - this can be disabled to squeeze out a bit more out of using floats, IIRC.

The above is VC++ and x86 specific, as requested by the question. Other platform, and especially other architecture, can differ radically.

The best one data type that is most efficient to work with as number counter (low or not) is int - usually regardless of the architecture and implementation (as the Standard recommends it to be the preferred word size of the platform).

Pavel Minaev
+1  A: 

Main memory access impacts your performance similarly to CPU cache. Modern (SDRAM, DDRx) DRAM access time improves considerably with locality of reference. What this means is that you want your data to be contiguous. This often contradicts object orientation. OOD would have you put all of the elements of one object together. If your algorithm called for operations against the various elements of one object then your referential locality would be good. If your algorithm called for operations against the same element of multiple objects, your locality would be bad. If you have "loops running mathematical operations numbering in the billions or even trillions of calculations" it is most likely that you have the latter situation. If your operations are picking elements out of objects, your CPU cache and RAM prefetches can be nearly ineffective or even detrimental. If this is the case, you can significantly increase performance by breaking your arrays of objects into synchronized arrays of like elements. It's ugly but can be much faster.

David McCracken
All the data is going to be very neatly arranged in a giant heap array that gets crunched sequentially. For example, say a number of calculations are done off three x-y coordinate values. The values in the heap array will be arranged like this: x1, y1, x2, y2, x3, y3, and repeats the pattern for the next set. Locality pretty much perfect.
Faken
A: 

You can have a dramatic increase in performance with smaller types (floats, 8-bit ints, etc) if you can use SIMD primitives. The SIMD primitives can pack eight 8-bit int operations (or two doubles, or 4 floats, etc) into a single operation that gets parallelized in hardware. Though, SIMD primitives are only supported on CPUs since the Pentium III.

Inverse