ansaurus

Question

Why in this example using floats makes me go 2x slower than with doubles?

Answer 1

+3 A:

Perhaps there's a series of double to float conversions happening somewhere that's taking up the CPU time. Can you look at the output with an IL disassembler and see what it's actually doing?

ConcernedOfTunbridgeWells 2009-05-14 11:38:36

If that were the case wouldn't using doubles be slower than using floats?

Trap 2009-05-14 11:49:29

Depends which way around the conversion is happening and what the optimiser actually does to the code. There is also a (small) possibility that the JIT on the runtime system is cocking something up. This is why I suggested disassembling the IL to see what was actually going on.

ConcernedOfTunbridgeWells 2009-05-14 12:12:02

Answer 2

+4 A:

Are you running this on a 64 or 32 bit processor? My experience has been that in some edge cases there are optimisations the CPU can do with low level functionality like this if the size of your object matches the size of the registers (even though you may assume that two floats would fit neatly in a 64 bit register you may still lose the optimisation benefit). You may find the situation reversed if you run it on a 32 bit system...

A quick search and the best I can do for a cite on this is a couple of posts to C++ game development forums (it was during my one year in game dev that I noticed this myself, but then that was the only time I was profiling to this level). This post has some interesting disassembly results from a C++ method that may be applicable at a very low level.

Another thought:

This article from MSDN goes into a lot of the internal specifics of using floats in .NET primarily to address the problematic issue of float comparison. There is one interesting paragraph from it which sums up the CLR spec for handling float values:

This spec clearly had in mind the x87 FPU. The spec is basically saying that a CLR implementation is allowed to use an internal representation (in our case, the x87 80 bit representation) as long as there is no explicit storage to a coerced location (a class or valuet type field), that forces narrowing. Also, at any point, the IL stream may have conv.r4 and conv.r8 instructions, which will force the narrowing to happen.

So your floats may not actually be floats when operations are being performed against them, instead they could be 80-bit numbers on a x87 FPU or anything else that the compiler may think is an optimisation or required for calculation accuracy. Without looking in the IL you won't know for sure, but there could be many costly casts when you are working with floats that don't hit when you are using doubles. It's a shame that you can't define the required precision for floating point operations in C# as you can through the fp switches in C++, since that would stop the compiler from putting everything into a larger container before operating on it.

Martin Harris 2009-05-14 11:53:48

I double checked. it's a 32 bit system.

Trap 2009-05-14 12:18:03

Answer 3

A:

The double to float conversion is probably slow it down at:

(float)readOffset

Try making readOffset float too.

leppie 2009-05-14 11:56:15

It's quite the opposite, when I use floats-only I even save a couple of casts and the performace goes much worse.

Trap 2009-05-14 12:21:08

Answer 4

+2 A:

It is possible that your calculations cause float values to enter 'denormal' state, which is very inefficient on most x86 processors. Denormal values are so small that they are at the edge of the smallest possible float value. In contrast, such values would fit comfortably in the double range, so in that case the calculations are efficient.

I can't be sure if this applies to you, but it certainly explains the behavior you're seeing.

http://en.wikipedia.org/wiki/Denormal

Frederik Slijkerman 2009-05-14 12:05:03

If it were denormalized floats wouldn't the sound be also screwed?

Trap 2009-05-14 12:10:57

No, when interpreted as audio samples denormalized floats are practically equal to silence. The only problem (but a big one) with denormals is that they are so slow.

Frederik Slijkerman 2009-05-14 12:35:54

Why not test for denormals in your inner loop? Just check if abs(x) < 1e-38 to see if x is a denormal (for single precision).

Frederik Slijkerman 2009-05-14 12:39:23

Trap 2009-05-14 13:16:26

Can't explain that... But you should remove (k>0), abs() will take care of that and also detect negative denormals.

Frederik Slijkerman 2009-05-14 13:33:45

... on second thought, you should replace (k>0) by (k!=0), otherwise you'd detect regular zeroes.

Frederik Slijkerman 2009-05-14 13:34:27

Oh, and check both src[iReadOffset] and the result of the calculation as well.I noticed you've used 'var' as type -- I'm not familiar with C#, but you could try replacing that with 'float' explicitly.

Frederik Slijkerman 2009-05-14 13:37:24

OK, looked at it again, and the only denormal possibility I see is if either src[iReadOffset]*(1f-k) or src[iReadOffset]*k becomes denormal. So check that instead. :-)

Frederik Slijkerman 2009-05-14 13:39:49

I checked all possibilities and I didn't found any denormalized float.

Trap 2009-05-14 14:10:48

Answer 5

+1 A:

One way to understand what is happening is to break into the debugger at this point in the code and to look at the actual x86 instructions that are being executed. Without knowing your C# translates into machine code, much of what might be suggested as the cause is just guesswork. Even looking at the IL is probably not going to tell you very much.

If you do this, you may want to start the program first and then attach the debugger later so that the JIT optimizations aren't disabled. You want to make sure you're looking at the code you're actually going to run, after all.

Curt Hagenlocher 2009-05-14 14:12:38

Answer 6

+1 A:

Considering the bulk of your code is not dealing with the 3 variables that you switched between doubles and floats, and you're talking about rather large changes in performance, I'd say that the small changes in types and tests is enough to change your cache footprint and/or register usage.

I did some quick tests on my 32 bit machine here:

// NOTE: runnable - copy in paste into your own project
class Program
    {
        static int endVal = 32768;
        static int runCount = 100;
        static void Main(string[] args)
        {
            Stopwatch doublesw = Stopwatch.StartNew();
            for (int i = 0; i < runCount; ++i)
                doubleTest();
            doublesw.Stop();
            Console.WriteLine("Double: " + doublesw.ElapsedMilliseconds);
            Stopwatch floatsw = Stopwatch.StartNew();
            for (int i = 0; i < runCount; ++i)
                floatTest();
            floatsw.Stop();
            Console.WriteLine("Float: " + floatsw.ElapsedMilliseconds);
            Console.ReadLine();
        }

        static void doubleTest()
        {
            double value = 0;
            double incr = 0.001D;

            while (value < endVal)
            {
                value += incr;
            }
        }

        static void floatTest()
        {
            float value = 0;
            float incr = 0.001f;

            while (value < endVal)
            {
                value += incr;
            }
        }
    }
}

and the results were:

Double: 12897
Float: 10059

Repeated tests showed float having a clear advantage over double. Now, this is a small program, and all those variables fit within the registers.

Unfortunately, there were enough missing parts to the code you supplied that i couldn't get a good compile and read of the assembly to see exactly what was going on exactly, but judging from my (quick) testing, this is my answer.

(For me, the giveaway was your case #3 - adding code changes the footprint and your cache patterns - I've seen that kind of strangeness a couple times in various languages)

cyberconte 2009-05-14 15:05:12

Answer 7

A:

Just a short question about your profiling. All you're writing are percentual values. So what about the total time, the function needs??

If you use within your function floats and in the outer place some doubles you need some time for converting, thus meaning the percentual time for the inner function itself drops due to the fact that the processing time for the function itself is constant and the whole process needs more time.

Hope my writing makes any sense and is understandable. But in short words, if your whole process needs a longer total time, the percentual value for a given function (which whole time stays constant, due to the fact that it won't be changed) will drop.

Oliver 2009-05-15 06:49:54

ansaurus

tags:

views:

answers:

Why in this example using floats makes me go 2x slower than with doubles?

related questions