I'm starting neural networks, currently following mostly D. Kriesel's tutorial. Right off the beginning it introduces at least three (different?) learning rules (Hebbian, delta rule, backpropagation) concerning supervised learning.
I might be missing something, but if the goal is merely to minimize the error, why not just apply gradient descent over Error(entire_set_of_weights)
?
Edit: I must admit the answers still confuse me. It would be helpful if one could point out the actual difference between those methods, and the difference between them and straight gradient descent.
To stress it, these learning rules seem to take the layered structure of the network into account. On the other hand, finding the minimum of Error(W)
for the entire set of weights completely ignores it. How does that fit in?