views:

727

answers:

6

Generally speaking what do you get out of extending an artificial neural net by adding more nodes to a hidden layer or more hidden layers?

Does it allow for more precision in the mapping, or does it allow for more subtlety in the relationships it can identify, or something else?

+6  A: 

Hello,

This is very interesting question but it's not so easy to answer. It depends on the problem you try to resolve and what neural network you try to use. There are several neural network types.

I general it's not so clear that more nodes equals more precision. Research show that you need mostly only one hidden layer. The numer of nodes should be the minimal numer of nodes that are required to resolve a problem. If you don't have enough of them - you will not reach solution.

From the other hand - if you have reached the number of nodes that is good to resolve solution - you can add more and more of them and you will not see any further progress in result estimation.

That's why there are so many types of neural networks. They try to resolve different types of problems. So you have NN to resolve static problems, to resolve time related problems and so one. The number of nodes is not so important like the design of them.

tomaszs
+3  A: 

When you have a hidden layer is that you are creating a combined feature of the input. So, is the problem better tackled by more features of the existing input, or through higher-order features that come from combining existing features? This is the trade-off for a standard feed-forward network.

You have a theoretical reassurance that any function can be represented by a neural network with two hidden layers and non-linear activation.

Also, consider using additional resources for boosting, instead of adding more nodes, if you're not certain of the appropriate topology.

John the Statistician
+3  A: 

Very rough rules of thumb

generally more elements per layer for bigger input vectors.

more layers may let you model more non-linear systems.

If the kind of network you are using has delays in propagation , more layers may allow modelling of time series . Take care to have time jitter in the delays or it wont work very well. If this is just gobbledegook to you, ignore it.

More layers lets you insert recurrent features. This can be very useful for discrimination tasks. You ANN implementation my not permit this.

HTH

Tim Williscroft
+1  A: 

The number of units per hidden layer accounts for the ANN's potential to describe an arbitrarily complex function. Some (complicated) functions may require many hidden nodes, or possibly more than one hidden layer.

When a function can be roughly approximated by a certain number of hidden units, any extra nodes will provide more accuracy...but this is only true if the training samples used are enough to justify this addition - otherwise what will happen is "overconvergence". Overconvergence means that your ANN has lost its generalization abilities because it has overemphasized on the particular samples.

In general it is best to use the less hidden units possible, if the resulting network can give good results. The additional training patterns required to justify more hidden nodes can not be found easily in most cases, and accuracy is not the NNs' strong point.

Wartin
+13  A: 

There's a very well known result in machine learning that states that a single hidden layer is enough to approximate any smooth, bounded function (the paper was called "Multilayer feedforward networks are universal approximators" and it's now almost 20 years old). There are several things to note, however.

  • The single hidden layer may need to be arbitrarily wide.
  • This says nothing about the ease with which an approximation may be found; in general large networks are hard to train properly and fall victim to overfitting quite frequently (the exception are so-called "convolutional neural networks" which really are only meant for vision problems).
  • This also says nothing about the efficiency of the representation. Some functions require exponential numbers of hidden units if done with one layer but scale much more nicely with more layers (for more discussion of this read Scaling Learning Algorithms Towards AI)

The problem with deep neural networks is that they're even harder to train. You end up with very very small gradients being backpropagated to the earlier hidden layers and the learning not really going anywhere, especially if weights are initialized to be small (if you initialize them to be of larger magnitude you frequently get stuck in bad local minima). There are some techniques for "pre-training" like the ones discussed in this Google tech talk by Geoff Hinton which attempt to get around this.

dwf
+1  A: 

You can test it by yourself by adding more nodes to a hidden layer or more hidden layers.

Try Sharky Neural Network - Neural network classification results live view.

SharkTime