views:

191

answers:

2

Hi all,

I'm trying to understand how to train a multilayer; however, I'm having some trouble figuring out how to determine a suitable network architecture--i.e., number of nodes/neurons in each layer of the network.

For a specific task, I have four input sources that can each input one of three states. I guess that would mean four input neurons firing either 0, 1 or 2, but as far as I'm told, input should be kept binary?

Furthermore am I having some issues choosing on the amount of neurons in the hidden layer. Any comments would be great.

Thanks.

+2  A: 

Determining an acceptable Network structure for a multi-layer perceptron is actually straightforward.

  1. Input Layer: How many features/dimensions are in your data--ie, how many columns in each data row. Add one to this (for the bias node) and that is the number of nodes for the first (input layer).

  2. Output Layer: Is your MLP running in 'machine' mode or 'regression' mode ('regression' used here in the machine learning rather than the statistical sense)--ie, does my MLP return a class label or a predicted value? If the latter, then your output layer has a single node. If the former, then your output layer has the same number of nodes as class labels. For instance, if the result you want is to label each instance as either "fraud", or "not fraud", that's two class labels, therefore, two nodes in your output layer.

  3. Hidden Layer(s): In between these two (input and output) are obviously the hidden layers. Always start with a single hidden layer. So H\how many nodes? It obviously has to be a number equal to or less than the number of nodes in the input layer and equal to or greater than the number for the output layer--any number of nodes that satisfies these constraints in is acceptable. If you need to add a second hidden layer, which you probably won't, then the number of nodes will be less than the first hidden layer but more than the output layer.

In sum, your initial model will always have three layers; the sizes of the first and last are fixed by your data, and by your model design, respectively. These two in turn constrain the size of the hidden layers. Any number of nodes for the hidden layer within those contains is acceptable (not necessarily ideal, but at least a plausible architecture).

So in your case, a suitable network structure to begin would be:

input layer: 5 nodes --> hidden layer: 4 nodes --> output layer 3 nodes

doug
Thank you, Doug. Your answer was very thorough and helpful. Its fascinating how books and professors can make problems appear so advanced, when the answer really is very (as you put it) straight forward.
Jonas Nielsen
A: 

I disagree with doug's answer above on a few points.

You have 4 discrete (3 way categorical) inputs. You should (unless you have a strong reason not to) represent that as 12 binary inputs using a 1-of-3 encoding for each of your four conceptual inputs. So if you input is [2,0,1,1] then your network should be given: 0 0 1 1 0 0 0 1 0 0 1 0 If your network implementation requires a manual bias, then you should add another always on bit for the bias, but most sensible neural net implementations don't require that.

Try a few different numbers of hidden units. You don't need to restrict yourself to a hidden layer size smaller than the input layer size, but if you make it larger you should be careful to regularlize your weights, perhaps with L2 or L1 weight decay and maybe even also doing early-stopping in training (stop training when your error on a held out validation set stops improving).