views:

185

answers:

3

Hello dear R people,

I have the following setup to analyse: We have about 150 subjects, and for each subject we performed a pair of tests (under different conditions) 18 times. The 18 different conditions of the test are complementary, in such a way so that if we where to average over the tests (for each subject), we would get no correlation between the tests (between subjects). What we wish to know is the correlation (and P value) between the tests, in within subjects, but over all the subjects.

The way I did this by now was to perform the correlation for each subject, and then look at the distribution of the correlations received so to see if it's mean is different then 0. But I suspect there might be a better way for answering the same question (someone said to me something about "geographical correlation", but a shallow search didn't help).

p.s: I understand there might be a place here to do some sort of mixed model, but I would prefer to present a "correlation", and am not sure how to extract such an output from a mixed model.

Also, here is a short dummy code to give an idea of what I am talking about:

attach(longley)
N <- length(Unemployed)
block <- c(
        rep( "a", N),
        rep( "b", N),
        rep( "c", N)
        )

Unemployed.3 <- c(Unemployed + rnorm(1),
                    Unemployed + rnorm(1),
                    Unemployed + rnorm(1))

GNP.deflator.3 <- c(GNP.deflator + rnorm(1),
                    GNP.deflator + rnorm(1),
                    GNP.deflator + rnorm(1))

cor(Unemployed, GNP.deflator)
cor(Unemployed.3, GNP.deflator.3)
cor(Unemployed.3[block == "a"], GNP.deflator.3[block == "a"])
cor(Unemployed.3[block == "b"], GNP.deflator.3[block == "b"])
cor(Unemployed.3[block == "c"], GNP.deflator.3[block == "c"])
(I would like to somehow combine the last three correlations...)

Any ideas will be welcomed.

Best, Tal

A: 

I'm no expert, but this looks to me like what you want. It's automated, short to code, gives the same correlations as your example above, and produces p-values.

> df = data.frame(block=block, Unemployed=Unemployed.3,
+ GNP.deflator=GNP.deflator.3)
> require(plyr)
Loading required package: plyr
> ddply(df, "block", function(x){
+   as.data.frame(
+     with(x,cor.test(Unemployed, GNP.deflator))[c("p.value","estimate")]
+ )})
  block    p.value  estimate
1     a 0.01030636 0.6206334
2     b 0.01030636 0.6206334
3     c 0.01030636 0.6206334

To see all the details, do this:

> dlply(df, "block", function(x){with(x,cor.test(Unemployed, GNP.deflator))})
$a

    Pearson's product-moment correlation

data:  Unemployed and GNP.deflator 
t = 2.9616, df = 14, p-value = 0.01031
alternative hypothesis: true correlation is not equal to 0 
95 percent confidence interval:
 0.1804410 0.8536976 
sample estimates:
      cor 
0.6206334 


$b

    Pearson's product-moment correlation

data:  Unemployed and GNP.deflator 
t = 2.9616, df = 14, p-value = 0.01031
alternative hypothesis: true correlation is not equal to 0 
95 percent confidence interval:
 0.1804410 0.8536976 
sample estimates:
      cor 
0.6206334 


$c

    Pearson's product-moment correlation

data:  Unemployed and GNP.deflator 
t = 2.9616, df = 14, p-value = 0.01031
alternative hypothesis: true correlation is not equal to 0 
95 percent confidence interval:
 0.1804410 0.8536976 
sample estimates:
      cor 
0.6206334 


attr(,"split_type")
[1] "data.frame"
attr(,"split_labels")
  block
1     a
2     b
3     c
Alex Brown
Hi Alex,Thank you for the effort, but I am afraid the solution I gave is not going to help (since it didn't produce a single P value).My problem is with the algorithm, not on how to do it :)I hope someone will come up with an idea.Cheers,Tal
Tal Galili
+1  A: 

If I understand your question correctly, you are interested in computing the intraclass correlation between multiple tests. There is an implementation in the psy package, although I have not used it.

If you want to perform inference on the correlation estimate, you could bootstrap the subjects. Just make sure to keep the tests together for each sample.

Tristan
Hi Tristan,Your answer is great, but I still have a piece missing.From looking at the links you provided, The icc in the psy package, allows for assessing rater reliability.That is, assuming we had X subjects, and on each of them we had performed N time the same test.Then if I where to want to know how similar these tests results (for each subject) are to each other - then this test would be perfect.BUTIn my situation, I have TWO tests (a and b - N times) for each (of the X) subjects. And on them I wish to compute the correlation between a and b.I'll keep looking.Many thanks!
Tal Galili
Each subject takes two tests? Or each subject takes two tests N times? If the former, that's isomorphic with two raters for the same test. Whether it is the same test but different raters or two tests doesn't matter. You're interested in the correlation between tests or raters. Maybe I still don't understand what you are doing though.
Tristan
I'm sort of with Tristan on this. It sounds like a case for a repeated-measure ANOVA, or a mixed-model ANOVA. Maybe I'm misunderstanding too?
briandk
Hi Tristan.In my case: "each subject takes two tests N times".I am interested in the correlation between the two tests "over" subjects.
Tal Galili
+2  A: 

I agree with Tristan - you are looking for ICC. The only difference from standard implementations is that the two raters (tests) evaluate each subject repeatedly. There might be an implementation that allows that. In the meanwhile here is another approach to get the correlation.

You can use "general linear models", which are generalizations of linear models that explicitly allow correlation between residuals. The code below implements this using the gls function of the nlme package. I am sure there are other ways as well. To use this function we have to first reshape the data into a "long" format. I also changed the variable names to x and y for simplicity. I also used +rnorm(N) instead of +rnorm(1) in your code, because that's what I think you meant.

library(reshape)
library(nlme)
dd <- data.frame(x=Unemployed.3, y=GNP.deflator.3, block=factor(block))
dd$occasion <- factor(rep(1:N, 3))  # variable denoting measurement occasions
dd2 <- melt(dd, id=c("block","occasion"))  # reshape

# fit model with the values within a measurement occasion correlated
#   and different variances allowed for the two variables
mod <- gls(value ~ variable + block, data=dd2, 
           cor=corSymm(form=~1|block/occasion), 
           weights=varIdent(form=~1|variable))  
# extract correlation
mod$modelStruct$corStruct

In the modeling framework you can use a likelihood ratio test to get a p-value. nlme can also give you a confidence interval:

mod2 <- gls(value ~ variable + block, data=dd2, 
           weights=varIdent(form=~1|variable))  
anova(mod, mod2)   # likelihood-ratio test for corr=0

intervals(mod)$corStruct  # confidence interval for the correlation
Aniko
Hi Aniko,First I'll just mention that I did mean rnorm(1) (since I wanted to make a change in the data between subject, by a factor that wouldn't change there inside correlation)Secondly - Great code!This (if I got it correctly), does exactly what I wanted.Now I have two more issues to solve:1) I don't have a deep enough understanding of this model structure - so I'll need to learn it.2) My real data are not really linear. They are integers from 0 to 3. So in a sense, I would have needed something non parametric to perform this.But either way - your answer was wonderful, thank you!
Tal Galili