ansaurus

Question

Is it possible to optimize this function?

Answer 1

A:

try to cache some values (I guess L_G and L_ETQ are not that variable, right?)
try to implement in architecture dependent code and use JNI.

onof 2010-07-23 09:05:30

Answer 2

+3 A:

The math does not immediately look like it can be reordered to avoid any duplicate calculations, so the approach to take depends on how this function is used and how accurate results you need.

The best approach would be avoiding recalculating the value for the same set of input values. Can your code save calculation results for the same input values? If not, you could have a cache for values, but be careful that doubles can have very many values, you might want to fold the doubles into a known interval (e.g. from 0 to 1 folds into integers from 0 to 99).

Thorbjørn Ravn Andersen 2010-07-23 09:06:19

Answer 3

+22 A:

Your function seems to be analytic, I would suggest replacing it entirely with an interpolation method. This way, you reduce the expensive calls to Math.Pow to a few arithmetical operations.

The best in this case should be rational function approximation. Your function is likely to have poles in the complex plane, this usually defeats polynomial interpolation.

Note that you have two variables: L_G - a0 - L_ETQ and L_ETQ. Interpolation should be performed in one variable only.

I'd go for rational function approximation of t2 as a function of L_G - a0 - L_ETQ. Take a look at Numerical Recipes for implementation techniques.

Also, for the last part, replace

Math.pow(10, 0.025 * L_ETQ);

by

Math.exp(L_ETQ * 0.05756462732485114210d)

(which is exp(L_ETQ * 0.025 * log(10))).

So you should be fine with a handful of arithmetical operations and one exponential.

EDIT: See a graph of t2 as a funtion of L_G - a0 - L_ETQ.

EDIT: Replace

double t1 = 1d + 1 / 4d * Math.pow(10d, 0.1d * (L_G - a0 - L_ETQ)); 
double t2 = Math.pow(t1, 0.25);

by

double x = L_G - a0 - L_ETQ;
double t1 = 0.25 * Math.exp(0.230259 * x) + 1;
double t2 = Math.sqrt(Math.sqrt(t1));

and you should gain some more %. At this point, rational approximation may be overengineering: you have two exp, and two sqrt.

Alexandre C. 2010-07-23 09:18:46

@Alexandre This seems to be interesting. Thanks.

InsertNickHere 2010-07-23 09:43:37

One caution - many floating point ops are so fast in modern processors that using a interpolation method instead can occasionally turn out slower.

Steve314 2010-07-23 13:29:47

@steve: pow is computed with exp and log, and must account for various special cases. Visual inspection shows a perfect candidate for Padé approximation near a "typical" point, and even ~50 arithmetic operations (which allows for a very accurate approximation) will be faster than a pow.

Alexandre C. 2010-07-23 13:50:37

@steve: and even if floating point builtins are indeed available, most language libraries don't use them.

Alexandre C. 2010-07-23 13:57:41

@Alexandre Added your edit, but not tested yet. But looks obvious that it will be slightly faster i think. ;-)

InsertNickHere 2010-07-23 14:30:50

@Alexandre - for the cost of pow I'll take your word, and if you're implying that embedded platforms may not have hardware float operations I take the point. But on "most language libraries don't use them" even when available - that seems a bit unlikely to me. Can you point to an example?

Steve314 2010-07-23 15:50:02

I'd like to see benchmarks of the revised function.

Thorbjørn Ravn Andersen 2010-07-25 15:34:32

@Thor Which kind of benchmark? I have written some benchmark with System.nanoTime and 100x72900 iterations for each function. The old takes for 72900 avg 735830nanoseconds and the new 481269nanosec.

InsertNickHere 2010-07-26 15:24:00

Answer 4

+3 A:

I would guess

double t2 = Math.sqrt(Math.sqrt(t1));

is faster than

double t2 = Math.pow(t1, 0.25);

Landei 2010-07-23 11:04:48

@Landei I have added that.

InsertNickHere 2010-07-23 12:12:24

@Insert: and did you ferify that it's actually faster?

Joachim Sauer 2010-07-23 13:05:40

@Joachim I'm n ot that good at microbenchmarks, but if my results are correct, its much faster. (~Fast: 33133194, Slow: 617337165/nanoseconds). I have done 2147483 calculations with 100 "iterations" each for testing.

InsertNickHere 2010-07-23 14:28:50

Answer 5

+2 A:

In glancing at that paper you reference, it seems that L_ETQ and a0 are simply a function of the frequency (Bark) of the sound.

So, at the very least you could come up with a table of the results of various calculations for given frequencies. For example, cache the results of:

.064 * Math.pow(10, 0.025 * L_ETQ)

by frequency. [Can also cache (a0 + L_ETQ) * .1]

Also, probably minor effect, if any, but I would convert the 1/4 to 0.25.

aepryus 2010-07-23 11:54:19

@aepryus I have added that.

InsertNickHere 2010-07-23 12:12:41

Answer 6

+1 A:

Pre-generate a lookup table for the range of inputs your program can handle.

It doesn't get any faster than that! :)

Fredrick Pennachi 2010-07-23 12:56:16

Answer 7

+1 A:

Caching the outputs, against the input params, may help:

http://en.wikipedia.org/wiki/Memoization

Dave 2010-07-23 13:17:56

I would be more concern about the total amount of cache misses. If he has a very wide range of values the probability of cache hits would be incredibly low. However, that would reduce further calculation of the same value sets. Also, I would be concerned with the cost of the look of in a large cache set [even with hashing]

monksy 2010-07-23 14:07:45

@Dave L_G - a0 - L_ETQ can have a huge variablity since the sound pressure level is measured in double precision and I don't want to lose that. Roughly speaking, I can have >3600000 different valules. (Assuming 1000 different pressures, 20 a0 and 20 L_ETQ values.)

InsertNickHere 2010-07-23 14:40:50

Answer 8

+1 A:

This hasn't been mention yet so I will.

You may want to consider moving from floating point math to integer. The operations are quite a bit faster. Graphics tend to use integer math rather than floating due to how floats are added and stored. You'll have to convert to and from, but I'm sure that you would receive quite a performance boost. The only issue with integer math is that you have to define how much precision you are willing to live with.

monksy 2010-07-23 14:10:42

@steven That might be correct, but i want/need the precision.

InsertNickHere 2010-07-23 14:45:55

Can you settle for a fixed upper limit on the amount of decimal places?

monksy 2010-07-23 15:22:10

Answer 9

A:

I would take some stackshots against it, to eliminate guesswork. That way I could be sure the time isn't being taken elsewhere, like in reading the data and converting it to floating point, or in opening/closing files. When I was sure this routine took a major % of time, I'm pretty sure it will be spending nearly all its time in calls to the Math functions, and in converting L_ETQ to integer. It may be possible to memoize those math functions. Then again, as Alexandre said, you might be able to do it all with interpolation.

The thing is, you probably have more than one thing to optimize. If this takes like 180 seconds, and if 50%, say, of that time this routine is on the stack, then if you cut its time by half, you have reduced time by 45 seconds to 135 seconds. However, now this routine is only on the stack for 45/135 seconds or 1/3. That means some other things are using the other 2/3 or 90 seconds, and I bet there are some things in there you could optimize as well. If you can cut those down to, say, 45 seconds, then the total is down to 90, and the % taken by the math routine is back up to 50%, so maybe you can squeeze some more out of it. That's how it goes. Each time you cut down one part of it, other parts increase in percentage, so you can go after them, over and over until you've really squeezed it as much as possible.

Mike Dunlavey 2010-09-09 20:58:37

ansaurus

tags:

views:

answers:

Is it possible to optimize this function?

related questions