views:

722

answers:

4

Hi, I'm a newbie to the world of ANN. I'm aware of the Gradient Desecent Rule and the Backpropagation Theorem. What I don't get is , when is using a bias important?

For example, when mapping the AND function, when i use 2 inputs and 1 output, it does not give the correct weights, however , when i use 3 inputs(1 of which is a bias), it gives the correct weights.

+2  A: 

Two different kinds of parameters can be adjusted during the training of an ANN, the weights and the value in the activation functions. This is impractical and it would be easier if only one of the parameters should be adjusted. To cope with this problem a bias neuron is invented. The bias neuron lies in one layer, is connected to all the neurons in the next layer, but none in the previous layer and it always emits 1. Since the bias neuron emits 1 the weights, connected to the bias neuron, are added directly to the combined sum of the other weights (equation 2.1), just like the t value in the activation functions.1

The reason it's impractical is because you're simultaneously adjusting the weight and the value, so any change to the weight can neutralize the change to the value that was useful for a previous data instance... adding a bias neuron without a changing value allows you to control the behavior of the layer.

Furthermore the bias allows you to use a single neural net to represent similar cases. Consider the AND boolean function represented by the following neural network:

ANN

  • w0 corresponds to b.
  • w1 corresponds to x1.
  • w2 corresponds to x2.

A single perceptron can be used to represent many boolean functions.

For example, if we assume boolean values of 1 (true) and -1 (false), then one way to use a two-input perceptron to implement the AND function is to set the weights w0 = -3, and w1 = w2 = .5. This perceptron can be made to represent the OR function instead by altering the threshold to w0 = -.3. In fact, AND and OR can be viewed as special cases of m-of-n functions: that is, functions where at least m of the n inputs to the perceptron must be true. The OR function corresponds to m = 1 and the AND function to m = n. Any m-of-n function is easily represented using a perceptron by setting all input weights to the same value (e.g., 0.5) and then setting the threshold w0 accordingly.

Perceptrons can represent all of the primitive boolean functions AND, OR, NAND ( 1 AND), and NOR ( 1 OR). Machine Learning- Tom Mitchell)

The threshold is the bias and w0 is the weight associated with the bias/threshold neuron.

Lirik
Ok, so how do I find out where I need to add a bias? I'm basically trying to program a 2 layer feed forward non linear neural network. Where should I add the bias in this?
Think of it as a general rule of thumb: add bias! Neural networks are *"unpredictable"* to a certain extent so if you add a bias neuron you're more likely to find solutions faster then if you didn't use a bias. Of course this is not mathematically proven, but it's what I've observed in literature and in general use.
Lirik
A: 

A layer in a neural network without a bias is nothing more than the multiplication of an input vector with a matrix. (The output vector might be passed through a sigmoid function for normalisation and for use in multi-layered ANN afterwards but that’s not important.)

This means that you’re using a linear function and thus an input of all zeros will always be mapped to an output of all zeros. This might be a reasonable solution for some systems but in general it is too restrictive.

Using a bias, you’re effectively adding another dimension to your input space, which always takes the value one, so you’re avoiding an input vector of all zeros. You don’t lose any generality by this because your trained weight matrix needs not be surjective, so it still can map to all values previously possible.

2d ANN:

For a ANN mapping two dimensions to one dimension, as in reproducing the AND or the OR (or XOR) functions, you can think of a neuronal network as doing the following:

On the 2d plane mark all positions of input vectors. So, for boolean values, you’d want to mark (-1,-1), (1,1), (-1,1), (1,-1). What your ANN now does is drawing a straight line on the 2d plane, separating the positive output from the negative output values.

Without bias, this straight line has to go through zero, whereas with bias, you’re free to put it anywhere. So, you’ll see that without bias you’re facing a problem with the AND function, since you can’t put both (1,-1) and (-1,1) to the negative side. (They are not allowed to be on the line.) The problem is equal for the OR function. With a bias, however, it’s easy to draw the line.

Note that the XOR function in that situation can’t be solved even with bias.

Debilski
If you use a sigmoid transfer function, you introduce non-linearity. Stating that this is a linear function is wrong and also somehow dangerous, as the non-linearity of the sigmoid is key to the solution of several problems.Also, sigmoid(0) = 0.5, and there is no x for which sigmoid(x) = 0.
bayer
Yeah, but it is 0.5 for any input of 0 without a bias, regardless of what the linear function before looks like. And that’s the point. You don’t normally train your sigmoid function, you just live with it. The linearity problem happens well before the sigmoid function.
Debilski
I get your point: the layer is not able to learn a different output for 0 than the one it started out with. That's correct and important. However, the "linear function argument" just does not apply in my opinion. Even with a bias, the function is still linear. The linearity property is misleading here. (Yes, I might be nitpicking.)
bayer
I’d say, that with a bias it’s *affine*. ( http://en.wikipedia.org/wiki/Affine_transformation#Representation )
Debilski
Yes, you're correct. Thanks for pointing out that difference to me. (Why do we call it linear regression then, btw, although it's affine?)
bayer
The regression model itself is linear. It just operates on the augmented input space, where every vector has an element ‘1’ added as the last component. This would probably be different if the bias was fixed, but I’m not sure about that. People might still call it *linear* even then.
Debilski
A: 

When you use ANNs, you rarely know about the internals of the systems you want to learn. Some things cannot be learned without a bias. E.g., have a look at the following data: (0, 1), (1, 1), (2, 1), basically a function that maps any x to 1.

If you have a one layered network (or a linear mapping), you cannot find a solution. However, if you have a bias it's trivial!

In an ideal setting, a bias could also map all points to the mean of the target points and let the hidden neurons model the differences from that point.

bayer
A: 

Like others have said, I think that biases are almost always helpful. In effect, a bias value allows you to shift the activation function to the left or right, which may be critical for successful learning.

It might help to look at a simple example. Consider this 1-input, 1-output network that has no bias:

simple network

The output of the network is computed by multiplying the input (x) by the weight (w0) and passing the result through some kind of activation function (e.g. a sigmoid function.)

Here is the function that this network computes, for various values of w0:

network output, given different w0 weights

Changing the weight w0 essentially changes the "steepness" of the sigmoid. That's useful, but what if you wanted the network to output 0 when x is 2? Just changing the steepness of the sigmoid won't really work -- you want to be able to shift the entire curve to the right.

That's exactly what the bias allows you to do. If we add a bias to that network, like so:

simple network with a bias

...then the output of the network becomes sig(w0*x + w1*1.0). Here is what the output of the network looks like for various values of w1:

network output, given different w1 weights

Having a weight of -5 for w1 shifts the curve to the right, which allows us to have a network that outputs 0 when x is 2.

Nate Kohl