views:

572

answers:

9

Hi,

after profiling a lot I found out that this method takes up most of the % of calculation time. I don't really see a way to optimize, since it is a horrible function. (it is...) Maybe someone can show me some nice idea or so?

public static double perceivedLoudness(double L_G, double L_ETQ, double a0) {
  double t1 = 1d + 1 / 4d * Math.pow(10d, 0.1d * (L_G - a0 - L_ETQ));
  double t2 = Math.pow(t1, 0.25);
  return 0.064d * Math.pow(10, 0.025 * L_ETQ) * (t2 - 1);
 }

Here is the improved version:

public static double perceivedLoudness(double L_G, double L_ETQ, double a0) {
  double x = L_G - a0 - L_ETQ;
  double t1 = 0.25 * Math.exp(0.230259 * x) + 1;
  double t2 = Math.sqrt(Math.sqrt(t1));
  return ltqFactors[(int)L_ETQ]  * (t2 - 1);
 }

The lookup for ltqFactors goes this way. ltqValues hold 20 points from the given ltq function, that approx should be sufficiant.

for( int i = 0; i < etqValues.length; ++i) {
  ltqFactors[(int)etqValues[i]] = 0.064d * Math.exp(etqValues[i] * 0.05756462732485114210d);
  }

Edit: After more test runs with more files, I come up to a ~100% speed up:

  • Old: 6,2% with 7000000 calls
  • New: 3,2% 8000000 calls.

Thank you so far!

Edit2: I don't know which answer to accept. :( With some other improvements (mostly lookup tables) the processing time for 9000 sound files went down from 4:30min to 3:28min.

I will keep this question open to see if there are other ideas, but then accept one answer.

Edit: I'm kind of frustrated now. I use a JFace treeviewer to let the user browse the results, and it need more time to update than the calculations itself. :/

A: 
  1. try to cache some values (I guess L_G and L_ETQ are not that variable, right?)
  2. try to implement in architecture dependent code and use JNI.
onof
+3  A: 

The math does not immediately look like it can be reordered to avoid any duplicate calculations, so the approach to take depends on how this function is used and how accurate results you need.

The best approach would be avoiding recalculating the value for the same set of input values. Can your code save calculation results for the same input values? If not, you could have a cache for values, but be careful that doubles can have very many values, you might want to fold the doubles into a known interval (e.g. from 0 to 1 folds into integers from 0 to 99).

Thorbjørn Ravn Andersen
+22  A: 

Your function seems to be analytic, I would suggest replacing it entirely with an interpolation method. This way, you reduce the expensive calls to Math.Pow to a few arithmetical operations.

The best in this case should be rational function approximation. Your function is likely to have poles in the complex plane, this usually defeats polynomial interpolation.

Note that you have two variables: L_G - a0 - L_ETQ and L_ETQ. Interpolation should be performed in one variable only.

I'd go for rational function approximation of t2 as a function of L_G - a0 - L_ETQ. Take a look at Numerical Recipes for implementation techniques.

Also, for the last part, replace

Math.pow(10, 0.025 * L_ETQ); 

by

Math.exp(L_ETQ * 0.05756462732485114210d)

(which is exp(L_ETQ * 0.025 * log(10))).

So you should be fine with a handful of arithmetical operations and one exponential.

EDIT: See a graph of t2 as a funtion of L_G - a0 - L_ETQ.

EDIT: Replace

double t1 = 1d + 1 / 4d * Math.pow(10d, 0.1d * (L_G - a0 - L_ETQ)); 
double t2 = Math.pow(t1, 0.25);

by

double x = L_G - a0 - L_ETQ;
double t1 = 0.25 * Math.exp(0.230259 * x) + 1;
double t2 = Math.sqrt(Math.sqrt(t1));

and you should gain some more %. At this point, rational approximation may be overengineering: you have two exp, and two sqrt.

Alexandre C.
@Alexandre This seems to be interesting. Thanks.
InsertNickHere
One caution - many floating point ops are so fast in modern processors that using a interpolation method instead can occasionally turn out slower.
Steve314
@steve: pow is computed with exp and log, and must account for various special cases. Visual inspection shows a perfect candidate for Padé approximation near a "typical" point, and even ~50 arithmetic operations (which allows for a very accurate approximation) will be faster than a pow.
Alexandre C.
@steve: and even if floating point builtins are indeed available, most language libraries don't use them.
Alexandre C.
@Alexandre Added your edit, but not tested yet. But looks obvious that it will be slightly faster i think. ;-)
InsertNickHere
@Alexandre - for the cost of pow I'll take your word, and if you're implying that embedded platforms may not have hardware float operations I take the point. But on "most language libraries don't use them" even when available - that seems a bit unlikely to me. Can you point to an example?
Steve314
I'd like to see benchmarks of the revised function.
Thorbjørn Ravn Andersen
@Thor Which kind of benchmark? I have written some benchmark with System.nanoTime and 100x72900 iterations for each function. The old takes for 72900 avg 735830nanoseconds and the new 481269nanosec.
InsertNickHere
+3  A: 

I would guess

double t2 = Math.sqrt(Math.sqrt(t1));

is faster than

double t2 = Math.pow(t1, 0.25);
Landei
@Landei I have added that.
InsertNickHere
@Insert: and did you ferify that it's actually faster?
Joachim Sauer
@Joachim I'm n ot that good at microbenchmarks, but if my results are correct, its much faster. (~Fast: 33133194, Slow: 617337165/nanoseconds). I have done 2147483 calculations with 100 "iterations" each for testing.
InsertNickHere
+2  A: 

In glancing at that paper you reference, it seems that L_ETQ and a0 are simply a function of the frequency (Bark) of the sound.

So, at the very least you could come up with a table of the results of various calculations for given frequencies. For example, cache the results of:

.064 * Math.pow(10, 0.025 * L_ETQ)

by frequency. [Can also cache (a0 + L_ETQ) * .1]

Also, probably minor effect, if any, but I would convert the 1/4 to 0.25.

aepryus
@aepryus I have added that.
InsertNickHere
+1  A: 

Pre-generate a lookup table for the range of inputs your program can handle.

It doesn't get any faster than that! :)

Fredrick Pennachi
+1  A: 

Caching the outputs, against the input params, may help:

http://en.wikipedia.org/wiki/Memoization

Dave
I would be more concern about the total amount of cache misses. If he has a very wide range of values the probability of cache hits would be incredibly low. However, that would reduce further calculation of the same value sets. Also, I would be concerned with the cost of the look of in a large cache set [even with hashing]
monksy
@Dave L_G - a0 - L_ETQ can have a huge variablity since the sound pressure level is measured in double precision and I don't want to lose that. Roughly speaking, I can have >3600000 different valules. (Assuming 1000 different pressures, 20 a0 and 20 L_ETQ values.)
InsertNickHere
+1  A: 

This hasn't been mention yet so I will.

You may want to consider moving from floating point math to integer. The operations are quite a bit faster. Graphics tend to use integer math rather than floating due to how floats are added and stored. You'll have to convert to and from, but I'm sure that you would receive quite a performance boost. The only issue with integer math is that you have to define how much precision you are willing to live with.

monksy
@steven That might be correct, but i want/need the precision.
InsertNickHere
Can you settle for a fixed upper limit on the amount of decimal places?
monksy
A: 

I would take some stackshots against it, to eliminate guesswork. That way I could be sure the time isn't being taken elsewhere, like in reading the data and converting it to floating point, or in opening/closing files. When I was sure this routine took a major % of time, I'm pretty sure it will be spending nearly all its time in calls to the Math functions, and in converting L_ETQ to integer. It may be possible to memoize those math functions. Then again, as Alexandre said, you might be able to do it all with interpolation.

The thing is, you probably have more than one thing to optimize. If this takes like 180 seconds, and if 50%, say, of that time this routine is on the stack, then if you cut its time by half, you have reduced time by 45 seconds to 135 seconds. However, now this routine is only on the stack for 45/135 seconds or 1/3. That means some other things are using the other 2/3 or 90 seconds, and I bet there are some things in there you could optimize as well. If you can cut those down to, say, 45 seconds, then the total is down to 90, and the % taken by the math routine is back up to 50%, so maybe you can squeeze some more out of it. That's how it goes. Each time you cut down one part of it, other parts increase in percentage, so you can go after them, over and over until you've really squeezed it as much as possible.

Mike Dunlavey