tags:

views:

149

answers:

3

Hi, I am working on a project that does a lot of querying and the some mathematical modeling based on results from these queries, and finally some scoring (read: "execution time too long to test thoroughly").

Recently I have realized a rather new problem/bug in my code; some of the results get NaN values for score! Here's how the scores are calculated:

Note that pfound, psig are doubles that are always positive or 0

Double  score1 = (pfound!=0) ? (Math.log(factorial((int)psig + 1))/pfound) : 0;

score1 = score1 * alpha_coeff[0];
if (score1.isInfinite())
    throw new RuntimeException(p.getName() + " score1 = Inf");
else if(score1.isNaN())
    throw new RuntimeException(p.getName() + " score1 = NaN");

I have checked the possible causes triggering NaN, but I believe it should be safe from most of those:

  1. I am already checking for pfound == 0 (so no divide by zero)

  2. Argument to Math.log() cannot have a negative value

What I suspect is whether or not factorial() (a custom function which return the factorial as a long) returns a long so big that it can't be cast into a double without loss of precision or something like that. I checked Long.doubleValue(), and apparently it generates NaN if it's argument results in NaN.

Any comments? Am I missing something fundamental here?

+2  A: 

You shouldn't compare doubles against zero explicitly - it almost never works. Better do something like this:

double EPS = 0.0000001;
if (Math.abs (pfound) < EPS) { //pfound is null } 

The only place I see which can produce NaN is Math.log. From its documentation:

  1. If the argument is NaN or less than zero, then the result is NaN.
  2. If the argument is positive infinity, then the result is positive infinity.
  3. If the argument is positive zero or negative zero, then the result is negative infinity.

I think pfound contains negative value neer zero and that's why you get NaN. Try to track variable values in a debugger.

Roman
Further to the above, if they (pFound, psig) are always positive or zero, why are you using a Double?
freespace
great call with pfound, and psig, i see no reason why they should be double, must have lost track of things there.. they are now int.s anyhow..
posdef
@Roman, when dealing with Java, it's usually best not to think (or suggest) that "null" and "0" are the same thing. A double cannot be assigned a value of null, but it can be assigned a value of 0.
Mark Peters
It feels almost like a MATLAB trick...
posdef
+2  A: 

NaNs propagate through various arithmetic operations so even if you are checking the conditions here correctly they could be getting introduced from elsewhere.

I'd suspect either alpha_coeff[0] or pfound - try checking these for NaN.

NaN could also be a result from your factorial function, depending on how this is defined. EDIT: just noticed that you specified that this produces a long, so factorial can't produce a NaN, on the other hand it could produce a negative result if it overflows which would cause a NaN from the log().

mikera
p.s. note that 0 != NaN, so if pfound is NaN then the first half of your condition gets executed. This will then produce NaN as the overall result.
mikera
alpha_coeff.s are all positive doubles between 0-1.. so thats safe too.. as for the factorial(), i think you might be onto something, I checked a bit more and it appears that the largest value they can hold is 2^63-1 which translates to something like E18. Gotta check if my factorial goes above that.. ps: does anyone know how to typeset the comments so that gray background shows for the code related stuff like NaN
posdef
If factorial is overflowing you could get a negative value, which would result in NaN from the log() function.....
mikera
OK, it turns out that the value factorial should return is around E32 which is clearly over the max value for Long. If I use `BigInteger` instead for my factorial function, can I send it to `Math.log()` just like a long?
posdef
Yep, you should be able to use BigInteger, you'll probably need to apply BigInteger.doubleValue() at some point to convert to double before doing the log.
mikera
+2  A: 

If your factorial is doing the naive evaluation of x! = 1*2*3..., I'll bet you're asking for a factorial of a number that can't fit into the reference you're using.

Two pieces of advice:

  1. Try BigDecimal if that's the case
  2. Use gamma function instead of naive implementation.

Recursion for factorial(n) where n > 12 is a very bad, naive idea. You weren't seriously thinking about going forward with something like this, were you?

duffymo
yes it is a very simple recursive function, and yes again, it does most likely cause an overflow as mentioned in my comment to @mikera's post above. It's very interesting that you mention the gamma function, I have not thought of that before :) Could you possibly comment on the performance of an implementation of that instead of the recursive factorial method. Oh, and lastly, I guess regardless what I use, I still need `BigInteger`, right?
posdef
No, BigDecimal. The gamma function is more general than the naive factorial. Gamma's different because it's not recursive: you pass it an argument and the value comes back. There's a nice implementation in "Numerical Recipes". You can also check out Abramowitz and Stegun. Or this: http://www.luschny.de/math/factorial/FastFactorialFunctions.htm
duffymo
A factorial should be a lookup table, not computed at run-time.
Hamish Grubijan
hmm, thanks for the links. turns out an implementation of `logGamma()` is implemented readily in Apache Commons Math package which I have already implemented. Then n! = `Math.exp(logGamma(n+1))` should solve my problems?
posdef
i just realized that I dont need the exp function at all, thanks a lot for the help on the factorial!
posdef
@Hamish - How many values in your lookup table? I agree that it can help, but it's not the whole solution. Especially if the argument can be non-integer. Gamma solves both.
duffymo
@duffy, ok, for non-integers - sure. But for integers, `100 ! = 9.33262154 x 10^157` so pre-computing and caching them starts to make sense for practical uses.
Hamish Grubijan
@Hamish - I think our solutions work together assuming that your lookup is really a cache. The use case would be: request a value for the parameter key; look up in table. If it doesn't appear, calculate using gamma and cache; if it does appear, return and done.
duffymo
Ah, memoization then - ok, we do have a deal. How fast does the implementation of a gamma function compute values?
Hamish Grubijan