views:

249

answers:

5

Double values store higher precision and are double the size of a float, but are Intel CPUs optimized for floats?

That is, are double operations just as fast or faster than float operations for +, -, *, and /?

Does the answer change for 64-bit architectures?

+2  A: 

The only really useful answer is: only you can tell. You need to benchmark your scenarios. Small changes in instruction and memory patterns could have a significant impact.

It will certainly matter if you are using the FPU or SSE type hardware (former does all its work with 80but extended precision, so double will be closer; later is natively 32bit, i.e. float).

Update: s/MMX/SSE/ as noted in another answer.

Richard
+12  A: 

There isn't a single "intel CPU", especially in terms of what operations are optimized with respect to others!, but most of them, at CPU level (specifically within the FPU), are such that the answer to your question:

are double operations just as fast or faster than float operations for +, -, *, and /?

is "yes" -- within the CPU. However, taking up twice the memory for each number clearly implies heavier load on the cache(s) and more memory bandwidth to fill and spill those cache lines from/to RAM; the time you care about performance of a floating-point operation is when you're doing a lot of such operations, so the memory and cache considerations are crucial.

@Richard's answer points out that there are also other ways to perform FP operations (the SSE instructions; good old MMX was integers-only), especially suitable for simple ops on lot of data ("SIMD", single instruction / multiple data) where each register can pack 4 single-precision floats or only 2 double-precision ones, so this effect will be even more marked.

In the end, you do have to benchmark, but my prediction is that for reasonable (i.e., large;-) benchmarks, you'll find advantage to sticking with single precision (assuming of course that you don't need the extra bits of precision!-).

Alex Martelli
This would also depend on the cache block size, correct? If your cache retrieves 64bit or larger blocks, then a double would be just as efficient (if not faster) than a float, at least so far as memory reads/writes is concerned.
Razor Storm
@Razor If you work exactly as many floats as fit in a cache line, then if you used doubles instead the CPU will have to fetch two cache lines. The caching effect I had in mind when reading Alex' answer however is: Your set of floats fits in you nth level cache but the corresponding set of doubles doesn't. In this case you will experience a big boost in performance if you use floats.
Peter G.
@Peter, yeah that makes sense, say you have a 32 bit cacheline, using doubles would have to fetch twice every time.
Razor Storm
@Razor, the problem's not really with fetching/storing just **one** value -- it is, as @Peter's focus correctly indicates, that often you're fetching "several" values to operate on (an array of numbers would be a typical example, and operations on items of such arrays very common in numerical applications). There are counterexamples (e.g., a pointer-connected tree where each node only has one number and a lot of other stuff: then having that number be 4 or 8 bytes will matter pretty little), which is part of why I say that in the end you have to benchmark, but the idea often applies.
Alex Martelli
@me pity I cannot edit the spelling mistakes of my earlier comment ...
Peter G.
@Alex Martelli, I see. That makes sense.
Razor Storm
A: 

Floating point is normally an extension to one's general purpose CPU. The speed will therefore be dependent on the hardware platform used. If the platform has floating point support, I will be surprised if there is any difference.

doron
+3  A: 

If all floating-point calculations are performed within the FPU, then, no, there is no difference between a double calculation and a float calculation because the floating point operations are actually performed with 80 bits of precision in the FPU stack. Entries of the FPU stack are rounded as appropriate to convert the 80-bit floating point format to the double or float floating-point format. Moving sizeof(double) bytes to/from RAM versus sizeof(float) bytes is the only difference in speed.

If, however, you have a vectorizable computation, then you can use the SSE extensions to run four float calculations in the same time as two double calculations. Therefore, clever use of the SSE instructions and the XMM registers can allow higher throughput on calculations that only use floats.

Daniel Trebbien
+1  A: 

Another point to consider is if you are using GPU(the graphics card). I work with a project that is numerically intensive, yet we do not need the percision that double offers. We use GPU cards to help further speed the processing. CUDA GPU's need a special package to support double, and the amount of local RAM on a GPU is quite fast, but quite scarce. As a result using float also doubles the amount of data we can store on the

Yet another point is memory. Floats take half as much RAM as doubles. If you are dealing with VERY large datasets, this can be a real important factor. If using double means you have to cache to disk vs pure ram, your difference will be huge.

So for the application I am working with, the difference is quite important.

Miley