views:

255

answers:

3

Hi,

I've read about neural network a little while ago and I understand how an ANN (especially a multilayer perceptron that learns via backpropagation) can learn to classify an event as true or false.

I think there are two ways :

1) You get one output neuron. It it's value is > 0.5 the events is likely true, if it's value is <=0.5 the event is likely to be false.

2) You get two output neurons, if the value of the first is > than the value of the second the event is likely true and vice versa.

In these case, the ANN tells you if an event is likely true or likely false. It does not tell how likely it is.

Is there a way to convert this value to some odds or to directly get odds out of the ANN. I'd like to get an output like "The event is has a 84% probability to be true"

+1  A: 

Once a NN has been trained, for eg. using backprogation as mentioned in the question (whereby the backprogation logic has "nudged" the weights in ways that minimize the error function) the weights associated with all individual inputs ("outside" inputs or intra-NN inputs) are fixed. The NN can then be used for classifying purposes.

Whereby the math (and the "options") during the learning phase can get a bit thick, it is relatively simple and straightfoward when operating as a classifier. The main algorithm is to compute an activation value for each neuron, as the sum of the input x weight for that neuron. This value is then fed to an activation function which purpose's is to normalize it and convert it to a boolean (in typical cases, as some networks do not have an all-or-nothing rule for some of their layers). The activation function can be more complex than you indicated, in particular it needn't be linear, but whatever its shape, typically sigmoid, it operate in the same fashion: figuring out where the activation fits on the curve, and if applicable, above or below a threshold. The basic algorithm then processes all neurons at a given layer before proceeding to the next.

With this in mind, the question of using the perceptron's ability to qualify its guess (or indeed guesses - plural) with a percentage value, finds an easy answer: you bet it can, its output(s) is real-valued (if anything in need of normalizing) before we convert it to a discrete value (a boolean or a category ID in the case of several categories), using the activation functions and the threshold/comparison methods described in the question.

So... How and Where do I get "my percentages"?... All depends on the NN implementation, and more importantly, the implementation dictates the type of normalization functions that can be used to bring activation values in the 0-1 range and in a fashion that the sum of all percentages "add up" to 1. In its simplest form, the activation function can be used to normalize the value and the weights of the input to the output layer can be used as factors to ensure the "add up" to 1 question (provided that these weights are indeed so normalized themselves).

Et voilà!

Claritication: (following Mathieu's note)
One doesn't need to change anything in the way the Neural Network itself works; the only thing needed is to somehow "hook into" the logic of output neurons to access the [real-valued] activation value they computed, or, possibly better, to access the real-valued output of the activation function, prior its boolean conversion (which is typically based on a threshold value or on some stochastic function).

In other words, the NN works as previously, neither its training nor recognition logic are altered, the inputs to the NN stay the same, as do the connections between various layers etc. We only get a copy of the real-valued activation of the neurons in the output layer, and we use this to compute a percentage. The actual formula for the percentage calculation depends on the nature of the activation value and its associated function (its scale, its range relative to other neurons' output etc.).
Here are a few simple cases (taken from the question's suggested output rules) 1) If there is a single output neuron: the ratio of the value provided by the activation function relative to the range of that function should do. 2) If there are two (or more output neurons), as with classifiers for example: If all output neurons have the same activation function, the percentage for a given neuron is that of its activation function value divided by the sum of all activation function values. If the activation functions vary, it becomes a case by case situation because the distinct activation functions may be indicative of a purposeful desire to give more weight to some of the neurons, and the percentage should respect this.

mjv
Hi, thanks for this answer, I'm not sure I understand correctly. What you say is you just threat the output of the ANN as probability and simply make sure that the probabilities adds up to one? How will the ANN learn that what it should output is probabilities. When I train it, I do train it with real result, so every output value in the training set are 0 or 1 (0.05 or 0.95). However when I use the ANN for real work, I'd like an output value of 0.7 means that the event is true (1) with a probability of 70%.Are you telling me I don't need to do anything special to get this kind of output?
Mathieu Pagé
@Mathieu See "clarification". I hope it effectively clarifies things, I sometimes make matters more confusing in my attempt to do the contrary ;-) I think in a nutshell it is only complicated because one needs to know the exact numerical semantics of the real-valued values associated with the outut of a neuron to be in a position to normalize these values properly. (I have values -plural, because it can either be the activation value or the [real-valued] output of the activation function)
mjv
Hi mjv, you clarifications effectively answer my interogations, thanks for this answer, I'll mark it as accepted.
Mathieu Pagé
@Mathieu. Merci! And glad I this was a clarification. Unrelated to the question but maybe of interest to you, I'd like to plug here the HTM concept developed by numenta (www.numenta.com), as this provides a very interesting framework for classifiers.
mjv
A: 

I can remember I saw an example of Neural network trained with back propagation to approximate the probability of an outcome in the book Introduction to the theory of neural computation (hertz krogh palmer). I think the key to the example was a special learning rule so that you didn't have to convert the output of a unit to probability, but instead you got automatically the probability as output.
If you have the opportunity, try to check that book.

(by the way, "boltzman machines", although less famous, are neural networks designed specifically to learn probability distributions, you may want to check them as well)

mic.sca
+1  A: 

What you can do is to use a sigmoid transfer function on the output layer nodes (that accepts data ranges (-inf,inf) and outputs a value in [-1,1]).
Then by using the 1-of-n output encoding (one node for each class), you can map the range [-1,1] to [0,1] and use it as probability for each class value (note that this works naturally for more than just two classes).

Amro