views:

456

answers:

3

I have been trying to get a simple double XOR neural network to work and I am having problems getting backpropagation to train a really simple feed forward neural network.
I have been mostly been trying to follow this guide in getting a neural network but have at best made programs that learn at extremely slow rate.

As I understand neural networks:

  1. Values are computed by taking the result of a sigmoid function from the sum of all inputs to that neuron. This is then feed to the next layer using the weight for each neuron
  2. At the end of running the error is computed for the output neurons, then using the weights, error is back propagated back by simply multiplying the values and then summing at each Neuron
  3. When all of the errors are computed the weights are adjusted by the delta = weight of connection * derivative of the sigmoid (value of Neuron weight is going to) * value of Neuron that connection is to * error of neuron * amount of output error of neuron going to * beta (some constant for learning rate)

This is my current muck of code that I am trying to get working. I have a lot of other attempts somewhat mixed in, but the main backpropagation function that I am trying to get working is on line 293 in Net.cpp

+2  A: 

Have a look at 15 Steps to implement a Neural Network, it should et you started.

Gregory Pakosz
A: 

Sounds to me like you are struggling with backprop and what you describe above doesn't quite match how I understand it to work, and your description is a bit ambiguous.

You calculate the output error term to backpropagate as the diffrence between the prediction and the actual value multiplied by the derivative of the transfer function. It is that error value which you then propagate backwards. The derivative of a sigmoid is calculated quite simply as y(1-y) where y is your output value. There are lots of proofs of that available on the web.

For a node on the inner layer, you multiply that output error by the weight between the two nodes, and sum all those products as the total error from the outer layer being propagated to the node in the inner layer. The error associated with the inner node is then multiplied by the derivative of the transfer function applied to the original output value. Here's some pseudocode:

total_error = sum(output_errors * weights)
node_error = sigmoid_derivative(node_output) * total_error

This error is then propagated backwards in the same manner right back through the input layer weights.

The weights are adjusted using these error terms and the output values of the nodes

weight_change = outer_error * inner_output_value

the learning rate is important because the weight change is calculated for every pattern/row/observation in the input data. You want to moderate the weight change for each row so the weights don't get unduly changed by any single row and so that all rows have an effect on the weights. The learning rate gives you that and you adjust the weight change by multiplying by it

weight_change = outer_error * inner_output_value * learning_rate

It is also normal to remember these changes between epochs (iterations) and to add a fraction of it to the change. The fraction added is called momentum and is supposed to speed you up through regions of the error surface where there is not much change and slow you down where there is detail.

weight_change = (outer_error * inner_output_value * learning_rate) + (last_change * momentum)

There are algorithms for adjusting the learning rate and momentum as the training proceeds.

The weight is then updated by adding the change

new_weight = old_weight + weight_change

I had a look through your code, but rather than correct it and post that I thought it was better to describe back prop for you so you can code it up yourself. If you understand it you'll be able to tune it for your circumstances too.

HTH and good luck.

Simon
I try to fix my code with your suggestions, (I did not get the momentum in it yet) but I am still having problem with the backprop system. (To me at least) It looks like you were telling me to do the same thing but just bundle move numbers into the error value. I feel as I am missing something small but important and that is causing my backprop to not function.
Matthew
I've tried a couple of times to figure out what your code is doing, but have given up. I think the total_error value you calculate is probably wrong because you do it for a layer and call DSigmoid twice. I suggest you do a single iteration with 2 rows of input data on paper or in Excel so you know how the whole thing works. Then get your network to spit out its weights so you can compare it to your calculations. By that stage you should understand a) what is supposed to happen and b) what's wrong with your code.
Simon
A: 

Check out the book under title :

AI Application Programming

Nice one to start with and have code for many AI techniques.

Ashish
That's a bit generic - you might as well post a link to a dictionary...
jon hanson