views:

271

answers:

2

I have trained xor neural network in Matlab and got these weights:

iw: [-2.162 2.1706; 2.1565 -2.1688]

lw: [-3.9174 -3.9183]

b{1} [2.001; 2.0033]

b{2} [3.8093]

Just from curiosity I have tried to write MATLAB code which computes the output of this network (2 neurons in hidden layer, and 1 in output, TANSIG activation function).

Code that I got:

l1w = [-2.162 2.1706; 2.1565 -2.1688];
l2w = [-3.9174 -3.9183];
b1w = [2.001 2.0033];
b2w = [3.8093];

input = [1, 0];

out1 = tansig (input(1)*l1w(1,1) + input(2)*l1w(1,2) + b1w(1));
out2 = tansig (input(1)*l1w(2,1) + input(2)*l1w(2,2) + b1w(2));
out3 = tansig (out1*l2w(1) + out2*l2w(2) + b2w(1))

The problem is when input is lets say [1,1], it outputs -0.9989, when [0,1] 0.4902. While simulating network generated with MATLAB outputs adequately are 0.00055875 and 0.99943.

What I'm doing wrong?

A: 

You usually don't use a sigmoid on your output layer--are you sure you should have the tansig on out3? And are you sure you are looking at the weights of the appropriately trained network? It looks like you've got a network trained to do XOR on [1,1] [1,-1] [-1,1] and [-1,-1], with +1 meaning "xor" and -1 meaning "same".

Rex Kerr
Then how do you normalize your output if you don't use Sigmoid in the output layer? Furthermore how do you measure error if your output is not normalized?
Lirik
For a classifier, you pick the output with the highest value (or toggle at the 50% point) to make your decision. You don't need the nonlinearity. In this case it's _okay_ to do it, but it doesn't really add much.
Rex Kerr
the problem of using a linear function in the output layer becomes apparent when you want to get posterior probabilities of each class in addition to the classifications..
Amro
@Amro: Fair enough. If you want them to be forced into the range (0,1), then yes, you should use `1/(1+exp(-y))`; you get approximate probabilities either way but you might exceed 1 (or fall below 0) if you just treat it as a function approximation. Whether that is a problem depends on the application.
Rex Kerr
+3  A: 

I wrote a simple example of an XOR network. I used newpr, which defaults to tansig transfer function for both hidden and output layers.

input = [0 0 1 1; 0 1 0 1];               %# each column is an input vector
ouputActual = [0 1 1 0];

net = newpr(input, ouputActual, 2);       %# 1 hidden layer with 2 neurons
net.divideFcn = '';                       %# use the entire input for training

net = init(net);                          %# initialize net
net = train(net, input, ouputActual);     %# train
outputPredicted = sim(net, input);        %# predict

then we check the result by computing the output ourselves. The important thing to remember is that by default, inputs/outputs are scaled to the [-1,1] range:

scaledIn = (2*input - 1);           %# from [0,1] to [-1,1]
for i=1:size(input,2)
    in = scaledIn(:,i);             %# i-th input vector
    hidden(1) = tansig( net.IW{1}(1,1)*in(1) + net.IW{1}(1,2)*in(2) + net.b{1}(1) );
    hidden(2) = tansig( net.IW{1}(2,1)*in(1) + net.IW{1}(2,2)*in(2) + net.b{1}(2) );
    out(i) = tansig( hidden(1)*net.LW{2,1}(1) + hidden(2)*net.LW{2,1}(2) + net.b{2} );
end
scaledOut = (out+1)/2;              %# from [-1,1] to [0,1]

or more efficiently expressed as matrix product in one line:

scaledIn = (2*input - 1);           %# from [0,1] to [-1,1]
out = tansig( net.LW{2,1} * tansig( net.IW{1}*scaledIn + repmat(net.b{1},1,size(input,2)) ) + repmat(net.b{2},1,size(input,2)) );
scaledOut = (1 + out)/2;            %# from [-1,1] to [0,1]
Amro
This is really in depth answer, thanks man!
spacemonkey

related questions