ansaurus

Question

How to create a simple Gradient Descent algorithm

Answer 1

+3 A:

First issue is that running this with only one piece of data gives you an underdetermined system... this means it may have an infinite number of solutions. With three variables, you'd expect to have at least 3 data points, preferably much higher.

Secondly using gradient descent where the step size is a scaled version of the gradient is not guaranteed to converge except in a small neighbourhood of the solution. You can fix that by switching to either a fixed size step in the direction of the negative gradient (slow) or a linesearch in the direction of the negative gradient ( faster, but slightly more complicated)

So for fixed step size instead of

theta0 = theta0 - step * dEdtheta0
theta1 = theta1 - step * dEdtheta1
theta2 = theta2 - step * dEdtheta2

You do this

n = max( [ dEdtheta1, dEdtheta1, dEdtheta2 ] )    
theta0 = theta0 - step * dEdtheta0 / n
theta1 = theta1 - step * dEdtheta1 / n
theta2 = theta2 - step * dEdtheta2 / n

It also looks like you may have a sign error in your steps.

I'm also not sure that derror is a good stopping criteria. (But stopping criteria are notoriously hard to get "right")

My final point is that gradient descent is horribly slow for parameter fitting. You probably want to use conjugate-gradient or Levenberg-Marquadt methods instead. I suspect that both of these methods already exist for python in the numpy or scipy packages (which aren't part of python by default but are pretty easy to install)

Michael Anderson 2010-10-01 10:35:36

Thank you for your great answer ! i know that it's not a great approach of the problem, i wanted to try implement this simple solution first and then use a variable step and try both "batch gradient descent" and "stochastic gradient descent".

ssaboum 2010-10-01 12:18:45

Just to be sure what is the expression you use for dEdtheta ?

ssaboum 2010-10-01 12:19:23

I'd take d = 400 - theta0 - 2104 * theta1 - 3 * theta2, E=d^2, dEdtheta0 = 2 * d * (-1), dEdtheta1 = 2 * d * (-2104), dEdtheta2= 2*d*(-3). Which would make the sign in your original equations correct. But if you look at the size of the gradients, they are huge compared to the 0.0001 scale factor, which means you end up taking step sizes that are too large from your starting point. Normalising the gradient, or limiting the step side in some other manner, should solve your issue.

Michael Anderson 2010-10-01 12:29:52

i tried setting up the step to 0.00000000001 and now the error is slowly decreasing, but the final answer for thetas always end up as (0, 0, 0)...

ssaboum 2010-10-01 12:32:10

That shouldn't be the case as at (0,0,0) you should have diff = 400, so all the thetas should increase at the end of that step. (though it may take a ridicoulously long time - with your step size of 1e-9, you'll only be moving by 1e-6 or so - this is why I suggest you normalise the step size in some way)

Michael Anderson 2010-10-01 12:51:29

ansaurus

tags:

views:

answers:

How to create a simple Gradient Descent algorithm

related questions