tags:

views:

831

answers:

8

Hi, So I have megabytes of data stored as doubles that need to be sent over a network... now I don't need the precision that a double offers, so I want to convert these to a float before sending them over the network. What is the overhead of simply doing:

float myFloat = (float)myDouble;

I'll be doing this operation several million times every few seconds and don't want to slow anything down. Thanks

EDIT: My platform is x64 with Visual Studio 2008.

EDIT 2: I have no control over how they are stored.

+8  A: 

It's going to depend on your implementation of C++ libraries. Test it and see.

Patrick
+2  A: 

You don't have any choice but to measure them yourself and see. You could use timers to measure them. Looks like some has already implemented a neat C++ timer class

Ashwin
+3  A: 

Bearing in mind that most compilers deal with doubles a lot more efficiently than floats -- many promote float to double before performing operations on them -- I'd consider taking the block of data, ZIPping/compressing it, then sending the compressed block across. Depending on what your data looks like, you could get 60-90% compression, vs. the 50% you'd get converting 8-byte values to four bytes.

Bob Kaufman
compression would be for sure slower than casting double to float, even if the numeric performance of the arithmetic operations is _the same_ (not better, as both will be probably promoted to extended format internally).
fortran
Hmmm very interesting... I'm not sure if it will have to be compressed (most likely not) but this could be a better solution then converting them. Any compression algorithms off the top of your head? Thanks
Polaris878
@fortran - absolutely, however how much slower vs. transmission time is still an unkinown without direct observation.
Bob Kaufman
@Polaris878 - consider LZW compression. I'm pretty sure it's copywritten, but you should be able to find a usable implementation. Or at least it will point you to a solution!
Bob Kaufman
What CPUs promote `float` to `double`? Are you by chance confusing it with x86/x64, which promotes both `float` and `double` to `long double` when loading into FPU registers, with default settings?
Pavel Minaev
Do you have reason to believe that 60-90% is reasonable? The figures I pull out of my rear are much lower than that. All but one or two bytes of each value are likely to be nearly random.
Mark Ransom
As peterchen's answer points out, a rough estimate suggests that the time required to transmist the results across the network dominates that of any reasonably simple calculation, so better compression may offer a more significant speedup.
Stephen C. Steel
+1  A: 

It will also depend on the CPU and what floating point support it has. In the bad old days (1980s), processors supported integer operations only. Floating point math had to be emulated in software. A separate chip for floating point (a coprocessor) could be bought separately.

Modern CPUs now have SIMD instructions, so large amounts of floating point data can be processed at once. These instructions include MMX, SSE, 3DNow! and the like. Your compiler may know how to make use of these instructions, but you may need to write your code in a particular way, and turn on the right options.

Finally, the fastest way to process floating point data is in a video card. A fairly new language called OpenCL lets you send tasks to the video card to be processed there.

It all depends on how much performance you need.

Kevin Panko
Yeah this is exactly why... all this data will be going to the video card on the other side of the network... so i want to make the data nice and neat before it reaches the client to be rendered.
Polaris878
OpenCL doesn't guarantee support for doubles and since he's not doing any real computation OpenCL might not be a good choice.
Amuck
+6  A: 

Even if it does take time this will not be the slow point in your application.
Your FPU can do the conversion a lot quicker than it can send network traffic (so the bottleneck here will more than likely be the write to the socket).

But as with al things like this measure it and see.

Personally I don't think any time spent here will affect the real time spent sending the data.

Martin York
+3  A: 

Assuming that you're talking about a significant number of packets to ship the data (a reasonable assumption if you're sending millions of values) casting the doubles to float will likely reduce the number of network packets by about half (assuming sizeof(double)==8 and sizeof(float)==4).

Almost certainly the savings in network traffic will dominate whatever time is spent performing the conversion. But as everyone says, measuring some tests will be the proof of the pudding.

Michael Burr
yes! this is exactly why I was interested in doing the conversion... some other posters didn't realize this :)
Polaris878
+6  A: 

As Michael Burr said, while the overhead strongly depends on your platform, the overhead is definitely less than the time needed to send them over the wire.


a rough estimate:

800MBit/s payload on a excellent Gigabit wire, 25M-floats/second.

On a 2GHz single core, that gives you a whopping 80 clock cycles for each value converted to break even - anythign less, and you will save time. That should be more than enough on all architectures :)

A simple load-store cycle (barring all caching delays) should be below 5 cycles per value. With instruction interleaving, SIMD extensions and/or parallelizing on multiple cores, you are likely to do multiple conversions in a single cycle.

Also, the receiver will be happy having to handle only half the data. Remember that memory access time is nonlinear.


The only thing arguing against the conversion would be is if the transfer should have minimal CPU load: a modern architecture could transfer the data from disk/memory to bus without CPU intervention. However, with above numbers I'd say that doesn't matter in practice.

[edit]
I checked some numbers, the 387 coprocessor would indeed have taken around 70 cycles for a load-store cycle. On the initial pentium, you are down to 3 cycles without any parallelization.

So, unless you run a gigabit network on a 386...

peterchen
Thanks for explaining it on the low level... this is good stuff. When dealing with lots of data, every clock cycle matters!
Polaris878
Very true! Let no CPU cycle go unpunished!
NTDLS
+2  A: 

I think this cast is a lot cheaper than you think since it doesn't really involve any kind of calculation. In fact, it's just bitshifting to get rid of some of the digits of exponent and mantissa.

ammoQ
The straightforward implementation is a FLD/FSTP pair (i.e. load the double on the floating point stack, write it back as float), so the conversion is likely implemented in hardware. It would be a bit mroe expensive in software - you need correct rounding and need to check range overflow.
peterchen
After checking the assembly code, these are the exact instructions it is doing. Thanks
Polaris878