views:

963

answers:

2

I'm working on creating a 2 layer neural network with back-propagation. The NN is supposed to get its data from a 20001x17 vector that holds following information in each row:

-The first 16 cells hold integers ranging from 0 to 15 which act as variables to help us determine which one of the 26 letters of the alphabet we mean to express when seeing those variables. For example a series of 16 values as follows are meant to represent the letter A: [2 8 4 5 2 7 5 3 1 6 0 8 2 7 2 7].

-The 17th cell holds a number ranging from 1 to 26 representing the letter of the alphabet we want. 1 stands for A, 2 stands for B etc.

The output layer of the NN consists of 26 outputs. Every time the NN is fed an input like the one described above it's supposed to output a 1x26 vector containing zeros in all but the one cell that corresponds to the letter that the input values were meant to represent. for example the output [1 0 0 ... 0] would be letter A, whereas [0 0 0 ... 1] would be the letter Z.

Some things that are important before i present the code: I need to use the traingdm function and the hidden layer number is fixed (for now) at 21.

Trying to create the above concept i wrote the following matlab code:

%%%%%%%%
%Start of code%
%%%%%%%%

%
%Initialize the input and target vectors
%
p = zeros(16,20001);
t = zeros(26,20001);

%
%Fill the input and training vectors from the dataset provided
%
for i=2:20001
    for k=1:16
        p(k,i-1) = data(i,k);
    end
    t(data(i,17),i-1) = 1;
end

net = newff(minmax(p),[21 26],{'logsig' 'logsig'},'traingdm');

y1 = sim(net,p);

net.trainParam.epochs = 200;
net.trainParam.show = 1;
net.trainParam.goal = 0.1;
net.trainParam.lr = 0.8;
net.trainParam.mc = 0.2;
net.divideFcn = 'dividerand';
net.divideParam.trainRatio = 0.7;
net.divideParam.testRatio = 0.2;
net.divideParam.valRatio = 0.1;

%[pn,ps] = mapminmax(p);
%[tn,ts] = mapminmax(t);

net = init(net);
[net,tr] = train(net,p,t);

y2 = sim(net,pn);

%%%%%%%%
%End of code%
%%%%%%%%

Now to my problem: I want my outputs to be as described, namely each column of the y2 vector for example should be a representation of a letter. My code doesn't do that though. Instead it produced results that vary greatly between 0 and 1, values from 0.1 to 0.9.

My question is: is there some conversion i need to be doing that i am not? Meaning, do i have to convert my input and/or output data to a form by which i can actually see if my NN is learning correctly?

Any input would be appreciated.

+2  A: 

This is normal. Your output layer is using a log-sigmoid transfer function, and that will always give you some intermediate output between 0 and 1.

What you would usually do would be to look for the output with the largest value -- in other words, the most likely character.

This would mean that, for every column in y2, you're looking for the index of the row that contains the largest value in that row. You can compute this as follows:

[dummy, I]=max(y2);

I is then a vector containing the indexes of the largest value in each row.

Martin B
Martin, thanks for the response. Using max(y2) i can now atleast get some information on how many times the network was right on identifying the letters. What i did do however before feeding the network the data i have was scale it down so that 0<=p(x)<=1. Seeing as the minimum value of p was 0 and the maximum was 15 i made a new input vector scaledp = p/15.
sp
+1  A: 

You can think of y2 as an output probability distribution for each input being one of the 26 alphabet characters, for example if one column of y2 says:

.2
.5
.15
.15

then its 50% probability that this character is B (if we assume only 4 possible outputs).



==REMARK==

The output layer of the NN consists of 26 outputs. Every time the NN is fed an input like the one described above it's supposed to output a 1x26 vector containing zeros in all but the one cell that corresponds to the letter that the input values were meant to represent. for example the output [1 0 0 ... 0] would be letter A, whereas [0 0 0 ... 1] would be the letter Z.

It is preferable to avoid using target values of 0,1 to encode the output of the network.
The reason for avoiding target values of 0 and 1 is that 'logsig' sigmoid transfer function cannot produce these output values given finite weights. If you attempt to train the network to fit target values of exactly 0 and 1, gradient descent will force the weights to grow without bound.
So instead of 0 and 1 values, try using values of 0.04 and 0.9 for example, so that [0.9,0.04,...,0.04] is the target output vector for the letter A.


Reference:
Thomas M. Mitchell, Machine Learning, McGraw-Hill Higher Education, 1997, p114-115

Amro
I don't think that's correct. Each element of the output vector will have a value varying between 0.00 and 1.00 but the sum of any column (or any element in that column for that matter) will never actually represent a percentage.
sp
you can always normalize it yourself:y2Normalized = y2 ./ repmat(sum(y2), 26, 1)
Amro
Alternatively you can use the difference between the highest value in y2 and the second highest value as a measure of confidence of the prediction.
Amro

related questions