views:

193

answers:

6

In an app I'm profiling, I found that in some scenarios this function is able to take over 10% of total execution time.

MSVC++ 2008 compiler is being used, for reference... I don't recall if modf maps to a single instruction or if there is likely any way to make it faster.

see also here for similar question on sqrt function

Unlike sqrt, I don't really know how modf works. Is there an assembly operation? For instance you could do:

modf(float input,int &intPart, float &floatPart)
{
 intPart= (int)input;
 floatPart= input - intPart;
}

But I assume this incurs penalties from casting/conversion, etc. how does a fast implementation work?

+1  A: 

modf should be a very fast function indeed, so the problem might be that it is still a function (ie, not being inlined). You could try using exactly the same code from the library but in an inline static function in a header to allow the compiler to inline it.

When the function is inlined, if you're always only using one of the mantissa/exponent, the compiler ought to be clever enough to only emit code to calculate that part, further speeding things up.

If you're still interested in rolling your own have a look at wikipedia on the floating point format

Autopulated
A: 

Note that the library has to do the fastest possible solution that works in all of the corner cases -- which, for something like this, adds quite a bit of complexity.

If your version with the casts works fine for your program, meaning that you're not having any floats that are outside of the range of an int, and either you've confirmed that it works correctly for negative numbers or that you don't care about them, then it's probably going to be a fair bit faster.

Brooks Moses
+2  A: 

You're getting good answers here, but where does the other 90% of time go?

Don't look at exclusive % time per routine.

Look at inclusive % time per line of code, and if possible, have it include blocked time, not just CPU time.

That way, you may well find out that part of the time, you don't even need to be in the modf function, or other functions.

An easy way to get that information is this technique.

Added: As you find optimizations that you can do, expect total execution time to decrease, but don't expect the percentages to necessarily go down. Your % time in modf and / or sqrt may actually go up if you get rid of other stuff, or they may go down if you find that you can memoize them (and therefore call them less), for example.

In this approach to optimization, you can think of the program's execution history as a big call tree, and what you are looking for is whole branches that you can prune off. What's more, since a line of code can appear in multiple branches of the call tree, pruning it in one prunes it in all.

Mike Dunlavey
This point is good. Arguably the easiest solution to spending too much time in one function is to *call it less*. (Of course that might not be possible - but often it is.)
Tom Smith
@Tom Smith: Yes. That's the point exactly. What's more, as you do this, the most profitable point to optimize moves around, so you can keep doing it until you hit diminishing returns. I've seen *big* speedup factors gotten this way.
Mike Dunlavey
Still, keep in mind that caching the results of "easy" operations can be a bad approach due to increased cache misses. Have a look at this: http://blogs.msdn.com/oldnewthing/archive/2004/12/20/327369.aspx.
Matteo Italia
@Matteo Italia: Right. That's an added wrinkle. I've done it for functions like exp() that can take a fair amount of time.
Mike Dunlavey
just an update, I was looking at the log in more detail. According to Sleepy, modf's exclusive % is 35%. Apparently, it's called from sqrt and bizarrely from std::string... does that seem likely?!
John
@John: Bizarre things happen. That's why I use (sorry) manual stack samples. Every such thing has a reason, and if you look at samples (and if you have the source or assembly code) you can see what the reason is just by looking. It doesn't take a lot of samples. For example, 10 samples should show it on about 3 of them.
Mike Dunlavey
Well isn't that what the _profiler_ does to tell me where things are called from, look at the stack trace?
John
@John: Don't get me started on that :-) Take a look at http://stackoverflow.com/questions/1777556/alternatives-to-gprof/1779343#1779343
Mike Dunlavey
A: 

I have also seen cases where small routines when called within a DLL (as is the case with the CRT) incur quite a hit from the glue code in getting in and out of the DLL. In such a case, then, rolling your own and altering how it is compiled (e.g., inlining it) could give you a performance boost even if the implementation itself is identical. YMMV, POITROAE, etc.

fbrereto
A: 

A good implementation of modf can be quite fast (on the order of 10 cycles on current hardware). A bad implementation can be quite slow (on the order of 100 cycles). A really poorly conceived implementation could conceivably take 1000 cycles. I don't know what the state of Microsoft's implementation is, but there are a number of good implementations in various open source C libraries that you might look at.

Your proposed implementation takes some shortcuts and wouldn't comply with the C standard; in particular, it will misbehave rather severely in the case where input is too large to be successfully converted to integer. It also gets the sign of zero wrong in some cases, but you might not care about that.

Note also that you would be well served to use a compiler / C library that supports the C99 standard, as you could then take advantage of the modff function and avoid the overhead of converting to and from double precision. I know that Intel's math library (which comes with their compilers) has an excellent modf and modff implementations. GCC also supports the C99 single-precision variants.

FWIW, I benchmarked your proposed implementation, and (assuming excellent compiler codegen), it's about 50% faster than the Intel library modff (Intel's implementation however, delivers the correct result for all inputs). The fastest correct implementation that I've benchmarked is only about 15% slower than your implementation (but again, gives the correct result for all inputs, and even sets the floating-point state flags correctly to boot).

Stephen Canon
even the poorest implementation would be < 100 cycles...
Autopulated
Thanks for the in-depth coverage. I never figured something as simple as modf had so many edge cases, and in my case I don't think they are a problem.
John
@Autopulated: You'd like to think so, wouldn't you? You'd be wrong, however. I won't name names, but I have seen a platform library: 1) set the rounding mode to round-to-zero 2) round to integer 3) convert back to float 4) subtract 5) restore the original rounding mode. Easily more than 100 cycles on a platform where setting the rounding mode incurs a substantial stall. I guess they let the intern write that one or something.
Stephen Canon
A: 

Your implementation is probably the fastest possible on x86. Though keep in mind that you limited the supported range of the input to the range of int!

You would probably want to set your compiler to use SSE(2) for floating-point math, as this gets rid of (possibly slow) control-word changes for truncation.

slacker