views:

91

answers:

2

I am looking for a method on how to calculate the number of layers and the number of neurons per layer. As input i only have the size of the input vector, the size of the output vector and the size of the trainig set.

Usually the best net is determined by trying different net topologies and selecting the one with the least error. Unfortunately I can not do that.

+8  A: 

This is a really hard problem.

The more internal structure a network has, the better that network will be at representing complex solutions. On the other hand, too much internal structure is slower, may cause training to diverge, or lead to overfitting -- which would prevent your network from generalizing well to new data.

People have traditionally approached this problem in several different ways:

  1. Try different configurations, see what works best. You can divide your training set into two pieces -- one for training, one for evaluation -- and then train and evaluate different approaches. Unfortunately it sounds like in your case this experimental approach isn't available.

  2. Use a rule of thumb. A lot of people have come up with a lot of guesses as to what works best. Concerning the number of neurons in the hidden layer, people have speculated that (for example) it should (a) be between the input and output layer size, (b) set to something near (inputs+outputs) * 2/3, or (c) never larger than twice the size of the input layer.

    The problem with rules of thumb is that they don't always take into account vital pieces of information, like how "difficult" the problem is, what the size of the training and testing sets are, etc. Consequently, these rules are often used as rough starting points for the "let's-try-a-bunch-of-things-and-see-what-works-best" approach.

  3. Use an algorithm that dynamically adjusts the network configuration. Algorithms like Cascade Correlation start with a minimal network, then add hidden nodes during training. This can make your experimental setup a bit simpler, and (in theory) can result in better performance (because you won't accidentally use an inappropriate number of hidden nodes).

There's a lot of research on this subject -- so if you're really interested, there is a lot to read. Check out the citations on this summary, in particular:

Nate Kohl
+1 very good answer. Whenever you need to adjust a parameter of the model, you can use a Cross-Validation technique as in the first approach. This becomes harder when you have multiple parameters to optimize, an example of that is libSVM which does a grid search over the space of its two parameters using cross-validation. Other approachs have used Genetic Algorithm to learn the structure of the network as well as the usual weights.
Amro
+2  A: 

Given the facts you have supplied, you can substantially subset your Question to one solely focused on the hidden layer size and number. You have the information available to determine the size and number of your input and output layers.

In addition, there is a general class of techniques ('pruning' algorithms) that can be applied during network training to substantially optimize network configuration by removing redundant nodes (more on those below).

So, from your Question, the parameters you need:

  • number of layers

  • number of neurons per layer

Your NN must have at least one input layer and one output layer--no more, no less. What is the size (number of neurons) of each of these layers in your problem? Well, you always know the size of the input layer--it's the size of your input vector (or it's one larger if you add a bias node).

Likewise, the size of the output layer, while not strictly determined by your data, is determined completely by you--it depends solely on what output you want from the NN. For instance, if you want it to assign one of three class labels to each instance (data point) of your data, then that's the size of your output layer--three neurons.

So that leaves the hidden layers. How many hidden layers? Well if your data is linearly separable (which you often know by the time you begin coding a NN) then you don't need any hidden layers at all.

Beyond that, as you probably know, there's a mountain of commentary on the question of hidden layer configuration in NNs (see the famous NN FAQ for an excellent summary of that commentary). One issue within this subject on which there is a consensus is the performance difference from adding additional hidden layers: the situations in which performance improves with a second (or third, etc.) hidden layer are very small. One hidden layer is sufficient for the large majority of problems.

So what about size of the hidden layer(s)? There are some empirically-derived rules-of-thumb, of these, the most commonly relied on is 'the size of the hidden layer is between the input and output layers. Jeff Heaton, author of "Introduction to Neural Networks in Java" offers a few more, which are recited on the page i just linked to.

In your question, you mentioned that for whatever reason, you cannot find the optimum network architecture by trial-and-error. Another way to tune your NN configuration (without using trial-and-error) is 'pruning'. The gist of this technique is removing nodes from the network during training by identifying those nodes which, if removed from the network, would not noticeably affect network performance (i.e., resolution of the data). (Even without using a formal pruning technique, you can get a rough idea of which nodes are not important by looking at your weight matrix after training; look weights very close to zero--it's the nodes on either end of those weights that are often removed during pruning.) Obviously, if you use a pruning algorithm during training then begin with a network configuration that is more likely to have excess (i.e., 'prunable') nodes--in other words, when deciding on a network architecture, err on the side of more neurons, if you add a pruning step.

Put another way, by applying a pruning algorithm to your network during training, you can much closer to an optimized network configuration than any a priori theory is ever likely to give you.

doug