What is the difference between supervised learning and unsupervised learning?

+7 A:

Supervised learning is when the data you feed your algorithm is "tagged" to help your logic make decisions.

Example: Bayes spam filtering, where you have to flag an item as spam to refine the results.

Unsupervised learning are types of algorithms that try to find correlations without any external inputs other than the raw data.

Example: datamining clustering algorithms.

Yann Schwartz 2009-12-02 10:55:39

Supervised if you already know what the category is.

Selinap 2009-12-02 13:57:30

@Selinap. Yep you're right.

Yann Schwartz 2009-12-02 14:07:03

+2 A:

For instance, very often training a neural network is supervised learning: you're telling the network to which class corresponds the feature vector you're feeding.

Clustering is unsupervised learning: you let the algorithm decide how to group samples into classes that share common properties.

Another example of unsupervised learning is Kohonen's self organizing maps.

Gregory Pakosz 2009-12-02 10:56:52

+1 A:

I have always found the distinction between unsupervised and supervised learning to be arbitrary and a little confusing. There is no real distinction between the two cases, instead there is a range of situations in which an algorithm can have more or less 'supervision'. The existence of semi-supervised learning is an obvious examples where the line is blurred.

I tend to think of supervision as giving feedback to the algorithm about what solutions should be preferred. For a traditional supervised setting, such as spam detection, you tell the algorithm "don't make any mistakes on the training set"; for a traditional unsupervised setting, such as clustering, you tell the algorithm "points that are close to each other should be in the same cluster". It just so happens that, the first form of feedback is a lot more specific than the latter.

In short, when someone says 'supervised', think classification, when they say 'unsupervised' think clustering and try not to worry too much about it beyond that.

StompChicken 2009-12-03 11:08:01

The distinction is actually well defined and simple. See David Robles answer.

bayer 2009-12-05 00:43:21

That definition is okay as far as it goes, but it's too narrow. What abut semi-supervised learning? It's both supervised and unsupervised. What about conditioning on a prior in Bayesian inference? Surely that's a form of supervision. What about the kind of inference used in machine translation with a (unsupervised) language model and (sort-of supervised?) set of aligned sentence pairs? 'Supervision' is just another form of inductive bias.

StompChicken 2009-12-05 09:18:55

I see your point, and find it quite interesting. However, I would not worry that much. The classic unsupervised/supervised distinction gets most of the cases.

bayer 2009-12-18 00:49:33

+1 A:

SUPERVISED LEARNING

Applications in which the training data comprises examples of the input vectors along with their corresponding target vectors are known as supervised learning problems.

UNSUPERVISED LEARNING

In other pattern recognition problems, the training data consists of a set of input vectors x without any corresponding target values. The goal in such unsupervised learning problems may be to discover groups of similar examples within the data, where it is called clustering

Pattern Recognition and Machine Learning (Bishop, 2006)

davidrobles 2009-12-03 13:30:48

+4 A:

Since you ask this very basic question, it looks like it's worth specifying what Machine Learning itself is.

Machine Learning is a class of algorithms which is data-driven, i.e. unlike "normal" algorithms it is the data that "tells" what the "good answer" is. Example: an hypothetical non-machine learning algorithm for face recognition in images would try to define what a face is (round skin-like-colored disk, with dark area where you expect the eyes etc). A machine learning algorithm would not have such coded definition, but will "learn-by-examples": you'll show several images of faces and not-faces and a good algorithm will eventually learn and be able to predict whether or not an unseen image is a face.

This particular example of face recognition is supervised, which means that your examples must be labeled, or explicitly say which ones are faces and which ones aren't.

In an unsupervised algorithm your examples are not labeled, i.e. you don't say anything. Of course in such a case the algorithm itself cannot "invent" what a face is, but it could be able to cluster the data in different class, e.g. it could be able to distinguish that faces are very different from panoramas, which are very different from horses.

Since another answer mention it (in an incorrect way), there are "intermediate" form of supervision, i.e. semi-supervised and active learning techniques. Technically, these are supervised methods, in which there is some "smart" way to avoid the large number of labeled examples. In active learning, the algorithm itself decides which thing you should label (e.g. it can be pretty sure about a panorama and a horse, but it might ask you to confirm if a gorilla is indeed the picture of a face). In semi-supervised approach, there are two different algorithms, which start with the labeled examples, and then "tell" each other way they think about some large number of unlabeled data. From this "discussion" they learn.

Davide 2009-12-06 05:24:58

ansaurus

tags:

views:

answers:

What is the difference between supervised learning and unsupervised learning?

related questions