views:

1040

answers:

7

I was having a look at this awesome tutorial on single layer perceptron. I tried the implementation out and it works like charm, but I was wondering if it's there any practical use for it as is (at such a low complexity degree).

Any example?

A: 

I think they are used in some sort of spam filters, but darn if I remember the details at this hour...

Treb
you're probably thinking of bayesian networks
Steven A. Lowe
No, I don't. I remember reading something about a combination of several filter methods, one of them was bayesian. And for another method they used a single perceptron. Just can't remember the source anymore.
Treb
it would be great to find some more info on this
JohnIdol
+1 to encourage you to find the info ;-)
Steven A. Lowe
Thanks Steven, I have already tried and I just can not find it. Will devote a few hours of the weekend to it (because of the +1 moral boost ;-)
Treb
One such article is here: http://www.paulgraham.com/spam.html These early techniques used the so-called Naive Bayes algorithm, which is equivalently powerful to perceptrons, and have almost nothing to do with Bayesian networks, besides both making reference to Bayes's Law.
John the Statistician
A: 

The more categorical the feature set, the better.

Robert Elwell
+1  A: 

you might want to wait for the multi-layer perceptron tutorial; single-layer perceptrons are incredibly limited - see Perceptrons by Minsky and Papert for an authoritative (and highly mathematical) study of what they can and cannot do.

Steven A. Lowe
I know they're limited - just wondering if there's any immediate practical application
JohnIdol
Steven A. Lowe
+6  A: 

You can actually do an incredible amount with just a perceptron. For example, many of the theoretical weaknesses of perceptrons can be overcome by moving to a richer feature representation of the data. The most standard way to do this is through kernels. Once you do this, you can then solve many different learning problems through reductions that transform these other problems into binary classification.

One major algorithm is SNOW, which is used by Dan Roth in natural language processing quite heavily

Lecture notes on the use of kernels in the perceptron algorithm can be found here: http://l2r.cs.uiuc.edu/~danr/Teaching/CS446-08/Lectures/04-LecOnline-P3.pdf

The rest of the notes on perceptron and winnow are handy as well: http://l2r.cs.uiuc.edu/~danr/Teaching/CS446-08/lectures.html

A further discussion of kernel perceptrons can be found in this paper: http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.48.8200

John Langford (not me, by the way) has done a lot of work in the reductions I mentioned: http://hunch.net/~jl/projects/reductions/reductions.html

John the Statistician
sounds great - can you provide examples or link to some good resources in this direction?
JohnIdol
Lecture notes on the use of kernels in the perceptron algorithm can be found here: http://l2r.cs.uiuc.edu/~danr/Teaching/CS446-08/Lectures/04-LecOnline-P3.pdf The rest of the notes on perceptron and winnow are handy as well: http://l2r.cs.uiuc.edu/~danr/Teaching/CS446-08/lectures.html
John the Statistician
A further discussion of kernel perceptrons can be found in this paper: http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.48.8200
John the Statistician
John Langford (not me, by the way) has done a lot of work in the reductions I mentioned: http://hunch.net/~jl/projects/reductions/reductions.html
John the Statistician
it did help - thanks. let's move those references to the answer!
JohnIdol
+1  A: 

The paper Neural methods for dynamic branch prediction describes how perceptrons can be used to predict whether a instruction branch will be taken or not in hardware with an accuracy of over >95%.

namin
this is good stuff + 1
JohnIdol
+1  A: 

A single layer perceptron is really just a fairly inefficient and inaccurate way of finding a least squares solution to a linear system. More efficient methods might use Singular Value Decomposition ( SVD ) to find the pseudoinverse which amounts ( I think ) to the doing the same thing that the single layer perceptron does while learning. But that said, finding least sqaures solutions is a generally useful thing, so in that sense the single layer perceptron is doing something pratical!

Tom Grove
So with a single layer perceptron I can do all the stuff in the article you link under 'typical uses and applications'?
JohnIdol
Correct, almost. There's much easier ways to calculate the pseudoinverse than SVD, one of which is simply computing (X^t X)^-1 X^t with your favourite algebra package, or else a cleverer iterative method. Simple perceptron learning, in general, sucks, and diverges if the step size is too high.
dwf
A: 

Perceptrons are simply a way of determining a (thresholded) linear combination (weighted sum of inputs). In practice, nearly nobody uses the perceptron procedure for learning because it's ridiculously inefficient and not guaranteed to ever find a solution if you choose your learning rate badly.

If you're predicting a real-valued quantity then perceptrons are equivalent to linear regression. If you're interested in binary classification, logistic regression is the tool to use, as it provides you a posterior probability of the likelihood of a given classification, given some (fairly reasonable) assumptions (that your predictive variables are Gaussian distributed, with different means but the same covariance).

dwf