views:

267

answers:

3

I'm trying to reduce a high-dimension dataset to 2-D. However, I don't have access to the whole dataset upfront. So, I'd like to generate a function that takes an N-dimensional vector and returns a 2-dimensional vector, such that if I give it to vectors that are close in N-dimensional space, the results are close in 2-dimensional space.

I thought SVD was the answer I needed, but I can't make it work.

For simplicity, let N=3 and suppose I have 15 datapoints. If I have all the data upfront in a 15x3 matrix X, then:

[U, S, V] = svd(X);
s = S; %s is a the reduced version of S, since matlab is case-sensitive.
s(3:end,3:end)=0;
Y=U*s;
Y=Y(1:2,:);

does what I want. But suppose I get a new datapoint, A, a 1x3 vector. Is there a way to use U, S, or V to turn A into the appropriate 1x2 vector?

If SVD is a lost cause, can someone tell me what I should be doing instead?

Note: This is Matlab code, but I don't care if the answer is C, Java, or just math. If you can't read Matlab, ask and I'll clarify.

+3  A: 

SVD is a fine approach (probably). LSA (Latent Semantic Analysis) is based around it, and has basically the same dimensionality approach. I've talked about that (at length) at: lsa-latent-semantic-analysis-how-to-code-it-in-php or check out the LSA tag here on SO.

I realize it's an incomplete answer. Holler if you want more help!

Gregg Lind
Thanks, that was helpful. In order to turn U into U', do I simply truncate everything after the second column, or is it fancier than that?
PlexLuthor
I'm pretty sure it's exactly that simple (assuming matlab orders the columns such that the cols and eigenvals correspond)
Gregg Lind
Ok. I just played around with it in the way I thought you said it would work, but I still can't take new 3-d data and get the 2-d projection without recalculating the whole UxSxV set. Did I miss something in LSA? That is, I have X (15x3), U, S, V, U', S', V', and now I get A (1x3). What should I do to get a 1x2 version of A?
PlexLuthor
Duh, divide by V* is what I was looking for. I don't know why I missed that earlier.
PlexLuthor
It sounds like you have it quite well in hand :) I can never remember the exact formulae, so I just noodle around until I get the right size end matrix, just as you are!
Gregg Lind
+1  A: 
% generate some random data (each row is a d-dimensional datapoint)
%data = rand(200, 4);
load fisheriris
data = meas;        % 150 instances of 4-dim

% center data
X = bsxfun(@minus, data, mean(data));

% SVD
[U S V] = svd(X, 'econ');       % X = U*S*V''

% lets keep k-components so that 95% of the data variance is explained
variances = diag(S).^2 / (size(X,1)-1);
varExplained = 100 * variances./sum(variances);
index = 1+sum(~(cumsum(varExplained)>95));

% projected data = X*V = U*S
newX = X * V(:,1:index);
biplot(V(:,1:index), 'scores',newX, 'varlabels',{'d1' 'd2' 'd3' 'd4'});

% mapping function (x is a row vector, or a matrix with multiple rows vectors)
mapFunc = @(x) x * V(:,1:index);
mapFunc([1 2 3 4])
Amro
A: 

I don't think there's a built-in way to update an existing SVD within Matlab. I google'd for "SVD update" and found this paper among the many results.

Victor Liu