tags:

views:

132

answers:

4

I'm a bit perplexed by some code I'm currently writing. I am trying to preform a specific gradient descent (main loop included below) and depending on the initial conditions I will alternatively get good looking results (perhaps 20% of the time) or everything becomes NaN (the other 80% of the time). However it seems to me that none of the operations in my code could produce NaN's when given honest numbers!

My main loop is:

// calculate errors
delta = m1 + m2 - M;
eta = f1 + f2 - F;
for (int i = 0; i < numChildren; i++) {
  epsilon[i] = p[i]*m1+(1-p[i])*m2+q[i]*f1+(1-q[i])*f2-C[i];
}

// use errors in gradient descent
// set aside differences for the p's and q's
float mDiff = m1 - m2;
float fDiff = f1 - f2;
// first update m's and f's
m1 -= rate*delta;
m2 -= rate*delta;
f1 -= rate*eta;
f2 -= rate*eta;
for (int i = 0; i < numChildren; i++) {
  m1 -= rate*epsilon[i]*p[i];
  m2 -= rate*epsilon[i]*(1-p[i]);
  f1 -= rate*epsilon[i]*q[i];
  f2 -= rate*epsilon[i]*(1-q[i]);
}
// now update the p's and q's
for (int i = 0; i < numChildren; i++) {
  p[i] -= rate*epsilon[i]*mDiff;
  q[i] -= rate*epsilon[i]*fDiff;  
}

This behavior can be seen when we have

rate = 0.01;
M = 30;
F = 30;
C = {15, 25, 35, 45};

with the p[i] and q[i] chosen randomly uniformly between 0 and 1, m1 and m2 chosen randomly uniformly to add to M, and f1 and f2 chosen randomly uniformly to add up to F.

Does anyone see anything that could create these NaN's?

+1  A: 

Have you tried sprinkling your code with System.out.println statements to determine exactly where NaNs start occuring?

Vuntic
I had done many, but not everything. I've done everything now and it seems to be that the gradient can push everything off to infinity, so I'll probably have to add some ad-hoc method to push it back.
Brent
+10  A: 

NaN is triggered by the following occurrences:

  • results that are complex values
    • √x where x is negative
    • log(x) where x is negative
    • tan(x) where x mod 180 is 90
    • asin(x) or acos(x) where x is outside [-1..1]
  • 0/0
  • ∞/∞
  • ∞/−∞
  • −∞/∞
  • −∞/−∞
  • 0×∞
  • 0×−∞
  • 1
  • ∞ + (−∞)
  • (−∞) + ∞

Sorry for such a general answer, but I hope that helped.

Delan Azabani
Well, from this list I can clearly deduce it must be one of the ones involving addition or multiplications, so I guess it boils down to hunting for where something overflows.
Brent
For those curious, like me, why is 1^∞ there: http://mathforum.org/library/drmath/view/56006.html
Oak
+2  A: 

Given what I know about gradient descent, you are most probably jumping out to infinity because you do not have adaptive rate (i.e. your rate is too big).

Ha
You are completely correct. I trimmed the rate and now it is well behaved -- color me embarrassed to not think of the obvious thing first.
Brent
+1  A: 
polygenelubricants