tags:

views:

289

answers:

2

Is there a way - other than a for loop - to generate new variables in an R dataframe, which will be all the possible 2-way interactions between the existing ones? i.e. supposing a dataframe with three numeric variables V1, V2, V3, I would like to generate the following new variables:

Inter.V1V2 (= V1 * V2) 
Inter.V1V3 (= V1 * V3)
Inter.V2V3 (= V2 * V3)

Example using for loop :

x <- read.table(textConnection('
   V1 V2 V3 V4
1  9   25   18
2  5   20   10
3  4   30   12
4  4   34   16'
), header=TRUE)

dim.init <- dim(x)[2]
for (i in 1: (dim.init - 1) ) {
        for (j in (i + 1) : (dim.init) ) {
                x[dim(x)[2] + 1]    <- x[i] * x[j]
                names(x)[dim(x)[2]] <- paste("Inter.V",i,"V",j,sep="")

        }
}
+4  A: 

Here you go, using combn and apply:

> x2 <- t(apply(x, 1, combn, 2, prod))

Setting the column names can be done with two paste commands:

> colnames(x2) <- paste("Inter.V", combn(1:4, 2, paste, collapse="V"), sep="")

Lastly, if you want all your variables together, just cbind them:

> x <- cbind(x, x2)
>   V1 V2 V3 V4 Inter.V1V2 Inter.V1V3 Inter.V1V4 Inter.V2V3 Inter.V2V4 Inter.V3V4
1  1  9 25 18          9         25         18        225        162        450
2  2  5 20 10         10         40         20        100         50        200
3  3  4 30 12         12         90         36        120         48        360
4  4  4 34 16         16        136         64        136         64        544
Shane
Very nice! Is there a way to also change the column names, according to the example, using apply?
gd047
I updated it to show this.
Shane
If you are just going to use these interactions in models that take formula, such as lm or glm, you don't need to generate the variables. See: http://cran.r-project.org/doc/manuals/R-intro.html#Formulae-for-statistical-models
Tristan
@Tristan : Ok, I know that when using models this can be done using R's modeling formulae. What I wanted to do is to generate interaction variables to use them as predictors in classification problems.
gd047
+6  A: 

Here is a one liner for you that also works if you have factors:

> model.matrix(~(V1+V2+V3+V4)^2,x)
  (Intercept) V1 V2 V3 V4 V1:V2 V1:V3 V1:V4 V2:V3 V2:V4 V3:V4
1           1  1  9 25 18     9    25    18   225   162   450
2           1  2  5 20 10    10    40    20   100    50   200
3           1  3  4 30 12    12    90    36   120    48   360
4           1  4  4 34 16    16   136    64   136    64   544
attr(,"assign")
 [1]  0  1  2  3  4  5  6  7  8  9 10
Ian Fellows
+1 Wasn't aware of the model.matrix function. Very useful!
Shane
Excellent! You could also get rid of the irrelevant (in our case) intercept model.matrix(~(V1+V2+V3+V4)^2-1,x)
gd047
right you are. or for the fully general case as.data.frame(model.matrix(~ .^2-1,x))
Ian Fellows