views:

110

answers:

5

I'm cross-posting this from math.stackexchange.com because I'm not getting any feedback and it's a time-sensitive question for me.


My question pertains to linear separability with hyperplanes in a support vector machine.

According to Wikipedia:

...formally, a support vector machine constructs a hyperplane or set of hyperplanes in a high or infinite dimensional space, which can be used for classification, regression or other tasks. Intuitively, a good separation is achieved by the hyperplane that has the largest distance to the nearest training data points of any class (so-called functional margin), since in general the larger the margin the lower the generalization error of the classifier.classifier.

The linear separation of classes by hyperplanes intuitively makes sense to me. And I think I understand linear separability for two-dimensional geometry. However, I'm implementing an SVM using a popular SVM library (libSVM) and when messing around with the numbers, I fail to understand how an SVM can create a curve between classes, or enclose central points in category 1 within a circular curve when surrounded by points in category 2 if a hyperplane in an n-dimensional space V is a "flat" subset of dimension n − 1, or for two-dimensional space - a 1D line.

Here is what I mean:

circularly enclosed class separation for a 2D binary SVM

That's not a hyperplane. That's circular. How does this work? Or are there more dimensions inside the SVM than the two-dimensional 2D input features?


This example application can be downloaded here.


Edit:

Thanks for your comprehensive answers. So the SVM can separate weird data well by using a kernel function. Would it help to linearize the data before sending it to the SVM? For example, one of my input features (a numeric value) has a turning point (eg. 0) where it neatly fits into category 1, but above and below zero it fits into category 2. Now, because I know this, would it help classification to send the absolute value of this feature for the SVM?

+2  A: 

I am not intimately familiar with SVMs, but from what I recall from my studies they are often used with a "kernel function" - essentially, a replacement for the standard inner product that effectively non-linearizes the space. It's loosely equivalent to applying a nonlinear transformation from your space into some "working space" where the linear classifier is applied, and then pulling the results back into your original space, where the linear subspaces the classifier works with are no longer linear.

The wikipedia article does mention this in the subsection "Non-linear classification", with a link to http://en.wikipedia.org/wiki/Kernel_trick which explains the technique more generally.

mokus
As I recall, Numerical Recipes had a pretty good section on implementing SVMs with kernel functions to solve nonlinear classification problems.
mokus
+5  A: 

As mokus explained, practical support vector machines use a non-linear kernel function to map data into a feature space where they are linearly separable:

SVM mapping one feature space into another

Lots of different functions are used for various kinds of data. Note that an extra dimension is added by the transformation.

(Illustration from Chris Thornton, U. Sussex.)

larsmans
How does the SVM know which kernel function will separate the data sensibly? Does it iterate though a bunch of equations and calculate which produces the largest margin? See also edits to my question.
FreshCode
The kernel is typically set by the user/developer as a parameter. LibSVM, for instance, has linear, polynomial, RBF and sigmoid kernel types. Its authors recommend the RBF kernel for beginners.
larsmans
A: 

SVM clustering

http://www.biomedcentral.com/1471-2105/8/S7/S18/figure/F4

HTH

plan9assembler
A: 

My answer to a previous question might shed some light on what is happening in this case. The example I give is very contrived and not really what happens in an SVM, but it should give you come intuition.

StompChicken
+3  A: 

Check out this YouTube video that illustrates an example of linearly inseparable points that become separable by a plane when mapped to a higher dimension.

alt text

Amro
+1 Very intuitive.
FreshCode