views:

99

answers:

1

Input : random vector X=xi, i=1..n.
vector of means for X=meanxi, i=1..n
Output : covariance matrix Sigma (n*n).
Computation :
1) find all cov(xi,xj)= 1/n * (xi-meanxi) * (xj-meanxj), i,j=1..n
2) Sigma(i,j)=cov(xi,xj), symmetric matrix.
Is this algorithm correct and has no side-effects?

+1  A: 

Each xi should be a vector (random variable) with it's own variance and mean.

Covariance matrix is symmetric, so you just need to compute one half of it (and copy the rest) and has variance of xi at main diagonal.

 S = ...// your symmetric matrix n*n
 for(int i=0; i<n;i++)
   S(i,i) = var(xi);
   for(j = i+1; j<n; j++)
     S(i,j) = cov(xi, xj);
     S(j,i) = S(i,j);
   end
 end

where variance (var) of xi:

v = 0;
for(int i = 0; i<xi.Count; i++)
  v += (xi(i) - mean(xi))^2;
end
v = v / xi.Count;

and covariance (cov)

cov(xi, xj) = r(xi,xj) * sqrt(var(xi)) * sqrt(var(xj))

where r(xi, xj) is Pearson product-moment correlation coefficient

EDIT
or, since cov(X, Y) = E(X*Y) - E(X)*E(Y)

cov(xi, xj) = mean(xi.*xj) - mean(xi)*mean(xj);

where .* is Matlab-like element-wise multiplication.
So if x = [x1, x2], y = [y1, y2] then z = x.*y = [x1*y1, x2*y2];

Gacek
The diagonal contains the variances.
Henrik
Yes, you are right. Corrected
Gacek
Why do you define covariance in terms of correlation? Usually it's done the other way round.
Henrik
@Gacek: I agree with Henrik. What you have written is correct, however, the correlation is defined in terms of covariance, so it is redundant to calculate the correlation and then obtain covariance from it.
Il-Bhima
Since i've brought vector of means back (was computed previously, i missed that), variance is computed easily as sum(x(i)-mean(i)/n). Now what about covariance itself - are there any ways to compute it simpler? Btw, in your formula sj is mis-typed xj i guess.
Singularity
Right. I simplified the formula. I was not sure if that is correct, so I posted the solution with Pearson at the first moment. But I checked it in Matlab and it seems to work fine.
Gacek
Ok, even if in my case of x=y={x1,x2,..} this formula would work, what should we do with the mean of product in brackets? mean(xi) is available, but not mean(xi*xj)... All i have in the beginning is X={1,2,3....} - vector of single dimension random variables (numbers), and vector EX={0.1,0.2,0.3,..} composed by means for each xi.
Singularity
p.s. Dug deeper into theory, found a small remark - X is "centered vector random process with mathematical expectation of zero".
Singularity