views:

1429

answers:

4

The user wants to impose a unique, non-trivial, upper/lower bound on the correlation between every pair of variable in a var/covar matrix.

For example: I want a variance matrix in which all variables have 0.9 > |rho(x_i,x_j)| > 0.6, rho(x_i,x_j) being the correlation between variables x_i and x_j.

Thanks.

A: 

This is not a complete answer, but a suggestion of a possible constructive method:

Looking at the characterizations of the positive definite matrices (http://en.wikipedia.org/wiki/Positive-definite_matrix) I think one of the most affordable approaches could be using the Sylvester criterion.

You can start with a trivial 1x1 random matrix with positive determinant and expand it in one row and column step by step while ensuring that the new matrix has also a positive determinant (how to achieve that is up to you ^_^).

fortran
Fortran,That would just generate a positive definite matrix, wouldn't it ?but how do you go about ensuring that all correlations are, say, between .7 and .9 ?
Hmmmmmmmmmm... good point, I didn't think about the second premise :-sIf I can imagine a solution in a while I'll update the answer, if not I'll delete it because is quite useless
fortran
+2  A: 

There are MANY issues here.

First of all, are the pseudo-random deviates assumed to be normally distributed? I'll assume they are, as any discussion of correlation matrices gets nasty if we diverge into non-normal distributions.

Next, it is rather simple to generate pseudo-random normal deviates, given a covariance matrix. Generate standard normal (independent) deviates, and then transform by multiplying by the Cholesky factor of the covariance matrix. Add in the mean at the end if the mean was not zero.

And, a covariance matrix is also rather simple to generate given a correlation matrix. Just pre and post multiply the correlation matrix by a diagonal matrix composed of the standard deviations. This scales a correlation matrix into a covariance matrix.

I'm still not sure where the problem lies in this question, since it would seem easy enough to generate a "random" correlation matrix, with elements uniformly distributed in the desired range.

So all of the above is rather trivial by any reasonable standards, and there are many tools out there to generate pseudo-random normal deviates given the above information.

Perhaps the issue is the user insists that the resulting random matrix of deviates must have correlations in the specified range. You must recognize that a set of random numbers will only have the desired distribution parameters in an asymptotic sense. Thus, as the sample size goes to infinity, you should expect to see the specified distribution parameters. But any small sample set will not necessarily have the desired parameters, in the desired ranges.

For example, (in MATLAB) here is a simple positive definite 3x3 matrix. As such, it makes a very nice covariance matrix.

S = randn(3);
S = S'*S
S =
      0.78863      0.01123     -0.27879
      0.01123       4.9316       3.5732
     -0.27879       3.5732       2.7872

I'll convert S into a correlation matrix.

s = sqrt(diag(S));

C = diag(1./s)*S*diag(1./s)
C =
            1    0.0056945     -0.18804
    0.0056945            1      0.96377
     -0.18804      0.96377            1

Now, I can sample from a normal distribution using the statistics toolbox (mvnrnd should do the trick.) As easy is to use a Cholesky factor.

L = chol(S)
L =
      0.88805     0.012646     -0.31394
            0       2.2207       1.6108
            0            0      0.30643

Now, generate pseudo-random deviates, then transform them as desired.

X = randn(20,3)*L;

cov(X)
ans =
      0.79069     -0.14297     -0.45032
     -0.14297       6.0607       4.5459
     -0.45032       4.5459       3.6549

corr(X)
ans =
            1     -0.06531      -0.2649
     -0.06531            1      0.96587
      -0.2649      0.96587            1

If your desire was that the correlations must ALWAYS be greater than -0.188, then this sampling technique has failed, since the numbers are pseudo-random. In fact, that goal will be a difficult one to achieve unless your sample size is large enough.

You might employ a simple rejection scheme, whereby you do the sampling, then redo it repeatedly until the sample has the desired properties, with the correlations in the desired ranges. This may get tiring.

An approach that might work (but one that I've not totally thought out at this point) is to use the standard scheme as above to generate a random sample. Compute the correlations. I they fail to lie in the proper ranges, then identify the perturbation one would need to make to the actual (measured) covariance matrix of your data, so that the correlations would be as desired. Now, find a zero mean random perturbation to your sampled data that would move the sample covariance matrix in the desired direction.

This might work, but unless I knew that this is actually the question at hand, I won't bother to go any more deeply into it. (Edit: I've thought some more about this problem, and it appears to be a quadratic programming problem, with quadratic constraints, to find the smallest perturbation to a matrix X, such that the resulting covariance (or correlation) matrix has the desired properties.)

woodchips
A: 

Woodship,

"First of all, are the pseudo-random deviates assumed to be normally distributed?"

yes.

"Perhaps the issue is the user insists that the resulting random matrix of deviates must have correlations in the specified range."

Yes, that's the whole difficulty

"You must recognize that a set of random numbers will only have the desired distribution parameters in an asymptotic sense."

True, but this is not the problem here: your strategy works for p=2, but fails for p>2, regardless of sample size.

"If your desire was that the correlations must ALWAYS be greater than -0.188, then this sampling technique has failed, since the numbers are pseudo-random. In fact, that goal will be a difficult one to achieve unless your sample size is large enough."

It is not a sample size issue b/c with p>2 you do not even observe convergence to the right range for the correlations, as sample size growths: i tried the technique you suggest before posting here, it obviously is flawed.

"You might employ a simple rejection scheme, whereby you do the sampling, then redo it repeatedly until the sample has the desired properties, with the correlations in the desired ranges. This may get tiring."

Not an option, for p large (say larger than 10) this option is intractable.

"Compute the correlations. I they fail to lie in the proper ranges, then identify the perturbation one would need to make to the actual (measured) covariance matrix of your data, so that the correlations would be as desired."

Ditto

As for the QP, i understand the constraints, but i'm not sure about the way you define the objective function; by using the "smallest perturbation" off some initial matrix, you will always end up getting the same (solution) matrix: all the off diagonal entries will be exactly equal to either one of the two bounds (e.g. not pseudo random); plus it is kind of an overkill isn't it ?

Come on people, there must be something simpler

vak
A: 

Due to loss of login information, i had to repost the question here, thanks in advance for your understanding.

vak