views:

54

answers:

2

I am writing a neural network class and have come across two concepts I don't understand. Would anyone be able to tell me what the bias and momentum is and does

+3  A: 

Bias is a constant input given to neurons. e.g. in a normal feed forward network, you might have 2 input units, 2 hidden units and 1 output unit. a constant bias value (let's say 1) will go into the hidden and output units in addition to the input from the input units.

Momentum is the additional learning rate used at the beginning of learning to make learning faster. e.g. learning error is usually initially very large, so you start with high momentum and adjust weights more aggressively. later on during learning as your error decreases, momentum should also decrease so you learn more slowly but you'll be less likely to overshoot the target.

Charles Ma
Great thanks! Although I still don't quite understand the benefits of a bias constant.
Louis
I can't remember the exact reasoning, but the bias allows your network to classify data that don't separate at the origin. it's like a shift in the boundary of separation. without a bias, you will need to train a threshold value for each output unit in addition to training the weights, that would be impractical and probably breaks the back propagation algorithm as well. so in a feed forward net, you should always have a bias to the hidden units and output units.
Charles Ma
+1  A: 

The bias allows the neuron to accept a wider range of input values. Momentum can be thought of as step size during the gradient decent.

In a typical node the bias and all the inputs from previous layer are weighted, summed, and then squashed to the output value. The squashing function is centered around zero and dramatically diminishes in sensitivity as the weighted sum becomes very positive or very negative. However, sometimes you want the sensitive part of the squashing to be at some region of the input other then right around zero. The bias input allows the learning algorithm to shift a node's response to accomplish that.

In addition to what Charles Ma described, momentum can also help carry the learning algorithm across a local minimum to find a better solution.

Ah I see. As in after the activation function? What is wrong with the sensitive part around 0?
Louis
well, the bias is typically inserted before activation function, as part of the summation, the effect is to shift the activation function's y-intercept. Nothing is 'wrong' with having the sensitive part around zero. It is insufficiently flexible to _only_ have the sensitive part around zero -- it would dramatically reduces neural network's ability to be an estimatation of the input/output function represented in the training data. In some sense each node is estimating a portion of the total mapping, without the extra degree of freedom from a bias input, the estimate has much more error.