tags:

views:

97

answers:

3

I'm using R, and I have two data.frames, A and B. They both have 6 rows, but A has 25000 columns (genes), and B has 30 columns. I'd like to apply a function with two arguments f(x,y) where x is every column of A and y is every column of B. So far it looks like this:

i = 1
for (x in A){
    j = 1
    for (y in B){
        out[i,j] <- f(x,y)
        j = j + 1
    }
    i = i + 1
}

I have two issues with this: from my Python programming I associate keeping track of counters like this as crufty, and from my R programming I am nervous of for loops. However, I can't quite see how to apply apply (or even if I should apply apply) to this problem and was hoping someone might enlighten me. I need to treat f() as atomic (it's actually cor.test()) for now.

A: 

Nesting the applies works, not the easiest syntax, though.

x<-data.frame(col1=c(1,2,3,4), col2=c(5,6,7,8), col3=c(9,10,11,12))
y<-data.frame(col1=c(1,2,3,4), col2=c(5,6,7,8))

z<-apply(x,2,function(col,df2)
             {
               apply(df2,2,function(col2,col1)
                           {
                              col2+col1
                           },col)
             },y)

z
 col1 col2 col3
[1,]    2    6   10
[2,]    4    8   12
[3,]    6   10   14
[4,]    8   12   16
[5,]    6   10   14
[6,]    8   12   16
[7,]   10   14   18
[8,]   12   16   20
Mark
So the first argument of `function()` always the one referenced in the apply, then you supply the second as an additional argument. Thanks! The syntax is OK in the notation of the question: `apply(A,2,function(a,B){apply(B,2,f,a)},B)`but still, a lot harder to read than to write? I think I'd have to write a wrapper if f(a,b) wasn't symmetric...
Mike Dewar
+2  A: 

Some data

nrows <- 6
A <- data.frame(a = runif(nrows), b = runif(nrows), c = runif(nrows))
B <- data.frame(z = rnorm(nrows), y = rnorm(nrows))

The trick: remember columns with expand.grid

counter <- expand.grid(seq_along(A), seq_along(B))
f <- function(x) 
{
  cor.test(A[, x["Var1"]], B[, x["Var2"]])$estimate
}

Now we only need 1 call to apply.

stats <- apply(counter, 1, f)
names(stats) <- paste(names(A)[counter$Var1], names(B)[counter$Var2], sep = ",")
stats
Richie Cotton
+3  A: 

Since you are using data frames, it might be faster to use lapply or sapply to do this (specially given the scope of your data frames). For example,

x <- data.frame(col1=c(1,2,3,4), col2=c(5,6,7,8), col3=c(9,10,11,12))
y <- data.frame(col1=c(1,2,3,4), col2=c(5,6,7,8))
bl <- lapply(x, function(u){
   lapply(y, function(v){
       f(u,v) # Function with column from x and column from y as inputs
   })
})
out = matrix(unlist(bl), ncol=ncol(y), byrow=T)
Abhijit