views:

455

answers:

3

The user wants to impose a unique, non-trivial, upper/lower bound on the correlation between every pair of variable in a var/covar matrix.

For example: I want a variance matrix in which all variables have 0.9 > |rho(x_i,x_j)| > 0.6, rho(x_i,x_j) being the correlation between variables x_i and x_j.

Thanks.


Ok, something of a quick&dirty solution has been found, still if anyone know of a more exact way to get there, it'll be welcome.


I lost my original login, so i'm reposting the question under a new login. The previous iteration got the following answer

*you mean pseudo-random, that's the correct terminology for semi random – Robert Gould

*Good point, but I think he meant semi pseudo-random (the pseudo is assumed when talking about computer randomness :-p) – fortran

*With "correlation", do you mean "covariance"? – Svante

*no, i really do mean correlation. I want to generate a positive definite matrix such that all the correlations have tighter than trivial bounds. – vak

*See my answer. Do you insist that the sample correlations lie within the specified bounds, or just the population correlations that generate the sample? I do suggest an idea that may work if your problem is the former. – woodchips

*woodship: no i'm afraid your solution will not work, please see my answer in the original threat (link above). Thanks.

+1  A: 

Here is your response to my answer in the original thread:

"Come on people, there must be something simpler"

I'm sorry, but there is not. Wanting to win the lottery is not enough. Demanding that the Cubs win the series is not enough. Nor can you just demand a solution to a mathematical problem and suddenly find it is easy.

The problem of generating pseudo-random deviates with sample parameters in a specified range is non-trivial, at least if the deviates are to be truly pseudo-random in any sense. Depending on the range, one may be lucky. I suggested a rejection scheme, but also stated it was not likely to be a good solution. If there are many dimensions and tight ranges on the correlations, then the probability of success is poor. Also important is the sample size, as that will drive the sample variance of the resulting correlations.

If you truly want a solution, you need to sit down and specify your goal, clearly and exactly. Do you want a random sample with a nominal specified correlation structure, but strict bounds on the correlations? Will any sample correlation matrix that satisfies the bound on the aims be satisfactory? Are the variances also given?

woodchips
Woodchips,Thank you very much for your input. Allow me to clarify that the lines you quote from my answer is about the QP approach you proposed at the end of your post, not the rejection scheme. The rejection scheme you propose is a bit tricky b/c i'm not sure what prior to impose on the eigenvalues (beyond the obvious fact that they should be positive), any idea there ?Tia
vak
PS: Woodship, arguably, my answer to your original input contains more reasoning and argumentations than the line you quote. I tried to address each of your suggestions in a constructive maner, i just hope i didn't sound disrespectful or anything. Best,
vak
"Do you want a random sample with a nominal specified correlation structure, but strict bounds on the correlations?" yes"Will any sample correlation matrix that satisfies the bound on the aims be satisfactory?" yes"Are the variances also given?"it doesn't matter. It's strictly about the correlation matrix
vak
+1  A: 

Perhaps this answer will help operationalize it:

One class of matrixes that has this property of non-negative definiteness is the Wishart Distribution. And samples from ~W() such that all non-diagonal entries are between some bounds [l,u] would fit your question. However, I don't believe this is the same as the distribution of all positive-definite matrices with non-diagonals in [l,u].

On the wikipedia page there is an algorithm for calculating from ~W().

A simpler, hackish solution (possibly approximating this) is to:

(given that u>l and l>0)

  1. draw from a multivariate normal where Sigma = mean(l,u).
  2. Then taking the sample, calculating its correlation matrix => C
  3. This matrix will have some randomness (fuzz), but the math of how much fuzz it will have is a little out of my my league to calculate. The values of the off-diags in this C matrix are bounded by [-1,1], with mean of mean(l,u). By eyeball, I'm guessing some sort of beta/exponential. In any case, that continuous distribution of the off diags in the C guarantees it won't behave and lie inside the bounds (l,u), unless (l,u) = [-1,1].
  4. You can adjust the amount of "fuzz" by increasing / decreasing the length of the sample in step 1. I'd wager (unproven) that the amount of variance in C's odd-diags is proportional to the square-root of the number of samples.

So this seems non-trivial to truly answer!

As other posters have suggested, you can generate from Wishart, then keep the ones where the property you want is true, but you might be sampling for a long time! If you exclude those who are 0-definite (is that a word?) then this should work fine for generating good matrices. However this is not the true distribution of all pos-def matrices whose off-diags are in [l,u].

Code (in R) for dumb-sampling scheme proposed above

sigma1 <- function(n,sigma) {
    out <- matrix(sigma,n,n)
    diag(out) <- 1
    return (out)
}

library(mvtnorm)
sample_around_sigma <- function(size, upper,lower, tight=500) {
    #  size:  size of matrix
    #  upper, lower:  bounds on the corr, should be > 0
    #  tight:  number of samples to use.  ideally this
    #     would be calcuated such that the odd-diags will
    #     be "pretty likely" to fall in [lower,upper]
    sigma <- sigma1(size,mean(c(upper,lower)))
    means <- 0*1:size
    samples <- rmvnorm(n=tight, mean=means,sigma=sigma)
    return (cor(samples))
}

> A <- sample_around_sigma(5, .3,.5)
> A
          [,1]      [,2]      [,3]      [,4]      [,5]
[1,] 1.0000000 0.3806354 0.3878336 0.3926565 0.4080125
[2,] 0.3806354 1.0000000 0.4028188 0.4366342 0.3801593
[3,] 0.3878336 0.4028188 1.0000000 0.4085453 0.3814716
[4,] 0.3926565 0.4366342 0.4085453 1.0000000 0.3677547
[5,] 0.4080125 0.3801593 0.3814716 0.3677547 1.0000000
> 
> summary(A[lower.tri(A)]); var(A[lower.tri(A)])
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
 0.3678  0.3808  0.3902  0.3947  0.4067  0.4366 
[1] 0.0003949876
Gregg Lind
Thanks Gregg, What do you mean by "draw from a multivariate normal where Sigma = mean(l,u)." ? Also, by mean(l,u) you mean (u-l)/2 ?Tia,
vak
Added more code to clearify. mean(l,u) :: (l+u)/2
Gregg Lind
I meant "draw from a multivariate normal with mean 0 (irrelevant actually) and sigma is of the form: 1 on diag, mean([l,u]) in the off-diags". I don't remember the term for that kind of matrix... equi-correlation maybe?
Gregg Lind
A: 

OK, fantastic Gregg: we're getting somewhere. Combining your idea with that of woodchips, yields this alternative approach. It's mathematically very dirty but it seems to work:

library(MCMCpack)
library(MASS)
p<-10
lb<-.6
ub<-.8
zupa<-function(theta){
    ac<-matrix(theta,p,p)
    fe<-rwish(100*p,ac%*%t(ac))
    det(fe)
}
ba<-optim(runif(p^2,-10,-5),zupa,control=list(maxit=10))
ac<-matrix(ba$par,p,p)
fe<-rwish(100*p,ac%*%t(ac))
me<-mvrnorm(p+1,rep(0,p),fe)
A<-cor(me)
bofi<-sqrt(diag(var(me)))%*%t(sqrt((diag(var(me)))))
va<-A[lower.tri(A)]
l1=100
while(l1>0){
    r1<-which(va>ub)
    l1<-length(r1)
    va[r1]<-va[r1]*.9
}
A[lower.tri(A)]<-va
A[upper.tri(A)]<-va
vari<-bofi*A
mk<-mvrnorm(10*p,rep(0,p),vari)
pc<-sign(runif(p,-1,1))
mf<-sweep(mk,2,pc,"*")
B<-cor(mf)
summary(abs(B[lower.tri(B)]))

Basically, this is the idea (say upper bound =.8 and lower it bound=.6), it has a good enough acceptance rate, which is not 100%, but it'll do at this stage of the project.

vak
what was the goal for this particular run? where are the upper and lower bounds here?
Gregg Lind