views:

296

answers:

2

Im using this library

http://pastebin.com/raw.php?i=aMtVv4RZ

to implement a learning agent.

I have generated the train cases, but i dont know for sure what are the validation and test sets, the teacher says:

70% should be train cases, 10% will be test cases and the rest 20% should be validation cases.

Thanks.

edit

i have this code, for training.. but i have no ideia when to stop training..

  def train(self, train, validation, N=0.3, M=0.1):
    # N: learning rate
    # M: momentum factor
    accuracy = list()
    while(True):
        error = 0.0
        for p in train:
            input, target = p
            self.update(input)
            error = error + self.backPropagate(target, N, M)
        print "validation"
        total = 0
        for p in validation:
            input, target = p
            output = self.update(input)
            total += sum([abs(target - output) for target, output in zip(target, output)]) #calculates sum of absolute diference between target and output

        accuracy.append(total)
        print min(accuracy)
        print sum(accuracy[-5:])/5
        #if i % 100 == 0:
        print 'error %-14f' % error
        if ? < ?:
            break

edit

i can get an average error of 0.2 with validation data, after maybe 20 training iterations, that should be 80%?

average error = sum of absolute diference between validation target and output given validation data input / size of validation data

1
        avg error 0.520395 
        validation
        0.246937882684
2
        avg error 0.272367   
        validation
        0.228832420879
3
        avg error 0.249578    
        validation
        0.216253590304
        ...
22
        avg error 0.227753
        validation
        0.200239244714
23
        avg error 0.227905    
        validation
        0.199875013416
A: 

I believe that in training mode, you allow the nodes of your network to alter the values of their input or output weights. You also provide positive or negative feedback in order to alter the weights. In other words, you give an input set, and feedback output with the output XOR ed against the known true/false, then NOT that. In other words, when the answers match, you give positive feedback, and when they disagree, you give negative feedback.

Not sure what the difference between test/validation cases is other than maybe you know the answer to the validation cases and use them to validate the output, nad maybe test cases you don't know the answer to, and just accept the answer from your validated neural net...

Zak
+8  A: 

The training and validation sets are used during training.

for each epoch
    for each training data instance
        propagate error through the network
        adjust the weights
        calculate the accuracy over training data
    for each validation data instance
        calculate the accuracy over the validation data
    if the threshold validation accuracy is met
        exit training
    else
        continue training

Once you're finished training, then you run against your testing set and verify that the accuracy is sufficient.

Training Set: this data set is used to adjust the weights on the neural network.

Validation Set: this data set is used to minimize overfitting. You're not adjusting the weights of the network with this data set, you're just verifying that any increase in accuracy over the training data set actually yields an increase in accuracy over a data set that has not been shown to the network before, or at least the network hasn't trained on it (i.e. validation data set). If the accuracy over the training data set increases, but the accuracy over then validation data set stays the same or decreases, then you're overfitting your neural network and you should stop training.

Testing Set: this data set is used only for testing the final solution in order to confirm the actual predictive power of the network.

Lirik
can you give a look at my code?
Daniel
@Daniel, what language is that? I'm not familiar with that syntax...
Lirik
its python :x i just cant get a stop criteria.. the values converge.. but always with some flutuation..
Daniel
@Daniel, does the training accuracy fluctuate or the validation accuracy fluctuates? It's possible that your validation accuracy fluctuates, but it's less likely that the training accuracy would fluctuate. When you say "input, target = p" does it mean that you're setting both to p?
Lirik
I'm not very good with python, so the code looks a little confusing to me... in general you want to stop training when your validation accuracy meets a certain threshold, say 70% or 90%, whatever makes sense for the domain of your data.
Lirik
p is a list, like[[1, 0, 1, 0, 1], [1, 0, 0]]so input, target = p is equal toinput = p[0]target = p[1]input = [1, 0, 1, 0, 1]target = [1, 0, 0]
Daniel
posted some data
Daniel