views:

153

answers:

3

Given four binary vectors which represent "classes":

[1,0,0,0,0,0,0,0,0,0]
[0,0,0,0,0,0,0,0,0,1]
[0,1,1,1,1,1,1,1,1,0]
[0,1,0,0,0,0,0,0,0,0]

What methods are available for classifying a vector of floating point values into one of these "classes"?

Basic rounding works in most cases:

round([0.8,0,0,0,0.3,0,0.1,0,0,0]) = [1 0 0 0 0 0 0 0 0 0]

But how can I handle some interference?

round([0.8,0,0,0,0.6,0,0.1,0,0,0]) != [1 0 0 0 0 1 0 0 0 0]

This second case should be a better match for 1000000000, but instead, I have lost the solution entirely as there is no clear match.

I want to use MATLAB for this task.

+1  A: 

A simple Euclidean distance algorithm should suffice. The class with the minimum distance to the point would be your candidate.

http://en.wikipedia.org/wiki/Euclidean%5Fdistance

klynch
+5  A: 

Find the SSD (sum of squared differences) of your test vector with each "class" and use the one with the least SSD.

Here's some code: I added a 0 to the end of the test vector you provided since it was only 9 digits whereas the classes had 10.

CLASSES = [1,0,0,0,0,0,0,0,0,0
           0,0,0,0,0,0,0,0,0,1
           0,1,1,1,1,1,1,1,1,0
           0,1,0,0,0,0,0,0,0,0];

TEST = [0.8,0,0,0,0.6,0,0.1,0,0,0];

% Find the difference between the TEST vector and each row in CLASSES
difference = bsxfun(@minus,CLASSES,TEST);
% Class differences
class_diff = sum(difference.^2,2);
% Store the row index of the vector with the minimum difference from TEST
[val CLASS_ID] = min(class_diff);
% Display
disp(CLASSES(CLASS_ID,:))

For illustrative purposes, difference looks like this:

 0.2    0 0 0 -0.6 0 -0.1 0 0 0
-0.8    0 0 0 -0.6 0 -0.1 0 0 1
-0.8    1 1 1  0.4 1  0.9 1 1 0
-0.8    1 0 0 -0.6 0 -0.1 0 0 0

And the distance of each class from TEST looks like this, class_diff:

 0.41
 2.01
 7.61
 2.01

And obviously, the first one is the best match since it has the least difference.

Jacob
+1: Beat me to it! I was typing virtually the exact same example when your answer popped up.
gnovice
Lol, thanks :) .. but I guess there aren't many other ways of doing it apart from using different metrics ..
Jacob
Yeah, the only differences were using REPMAT instead of BSXFUN and ABS instead of squaring the difference.
gnovice
Say, about `bsxfun` and `repmat`, what's the difference? I used to think `bsxfun` was more memory efficient (magically applied the operator along some dimension) unlike `repmat` but recent experiments have disproved that .. any comments?
Jacob
You champion! Thanks so much.
idea
@Jacob: I've never actually tested it myself for any situations, but Loren from The MathWorks has a blog post where she does some comparisons: http://blogs.mathworks.com/loren/2008/08/04/comparing-repmat-and-bsxfun-performance/.
gnovice
+2  A: 

This is the same thing as Jacob did, only with four different distance measures:


%%
CLASSES = [1,0,0,0,0,0,0,0,0,0
           0,0,0,0,0,0,0,0,0,1
           0,1,1,1,1,1,1,1,1,0
           0,1,0,0,0,0,0,0,0,0];

TEST = [0.8,0,0,0,0.6,0,0.1,0,0,0];

%%
% sqrt( sum((x-y).^2) )
euclidean = sqrt( sum(bsxfun(@minus,CLASSES,TEST).^2, 2) );

% sum( |x-y| )
cityblock = sum(abs(bsxfun(@minus,CLASSES,TEST)), 2);

% 1 - dot(x,y)/(sqrt(dot(x,x))*sqrt(dot(y,y)))
cosine = 1 - ( CLASSES*TEST' ./ (norm(TEST)*sqrt(sum(CLASSES.^2,2))) );

% max( |x-y| )
chebychev = max( abs(bsxfun(@minus,CLASSES,TEST)), [], 2 );

dist = [euclidean cityblock cosine chebychev];

%%
[minDist classIdx] = min(dist);

Pick the one you like :)

Amro

related questions