views:

225

answers:

5

Let's say I have a matrix x which contains 10 rows and 2 columns. I want to generate a new matrix M that contains each unique pair of rows from x - that is, a new matrix with 55 rows and 4 columns.

E.g.,

x <- matrix (nrow=10, ncol=2, 1:20)

M <- data.frame(matrix(ncol=4, nrow=55))
k <- 1
for (i in 1:nrow(x))
for (j in i:nrow(x))
{
    M[k,] <- unlist(cbind (x[i,], x[j,]))
    k <- k + 1
}

So, x is:

      [,1] [,2]
 [1,]    1   11
 [2,]    2   12
 [3,]    3   13
 [4,]    4   14
 [5,]    5   15
 [6,]    6   16
 [7,]    7   17
 [8,]    8   18
 [9,]    9   19
[10,]   10   20

And then M has 4 columns, the first two are one row from x and the next 2 are another row from x:

> head(M,10)
   X1 X2 X3 X4
1   1 11  1 11
2   1 11  2 12
3   1 11  3 13
4   1 11  4 14
5   1 11  5 15
6   1 11  6 16
7   1 11  7 17
8   1 11  8 18
9   1 11  9 19
10  1 11 10 20

Is there either a faster or simpler (or both) way of doing this in R?

+2  A: 

I'm not quite grokking what you are doing so I'll just throw out something that may, or may not help.

Here's what I think of as the Cartesian product of the two columns:

expand.grid(x[,1],x[,2])
JD Long
Never knew about expand.grid(). Dirk's answer brings it all together (as always...)
Josh Reich
+5  A: 

The expand.grid() function useful for this:

R> GG <- expand.grid(1:10,1:10)
R> GG <- GG[GG[,1]>=GG[,2],]     # trim it to your 55 pairs
R> dim(GG)
[1] 55  2
R> head(GG)
  Var1 Var2
1    1    1
2    2    1
3    3    1
4    4    1
5    5    1
6    6    1
R>

Now you have the 'n*(n+1)/2' subsets and you can simple index your original matrix.

Dirk Eddelbuettel
+1  A: 

You can also try the "relations" package. Here is the vignette. It should work like this:

relation_table(x %><% x)
Shane
+1  A: 

Using Dirk's answer:

idx <- expand.grid(1:nrow(x), 1:nrow(x))
idx<-idx[idx[,1] >= idx[,2],]
N <- cbind(x[idx[,2],], x[idx[,1],])

> all(M == N)
[1] TRUE

Thanks everyone!

Josh Reich
+1  A: 

Although the answer constructed agrees with the implemented example it does not agree with the problem description. The number of unique combinations of 10 distinct items taken two at a time is 45, not 55. Neither is that 55 element set the Cartesian product which would contain 10 x 10 pairs. Here's a solution that would result in 45 unique "combinations" using the R combn function:

M <- data.frame(cbind(x[z[1,],], x[z[2,],]) ) nrow(M) [1] 45 head(M) X1 X2 X3 X4 1 1 11 2 12 2 1 11 3 13 3 1 11 4 14 4 1 11 5 15 5 1 11 6 16 6 1 11 7 17

DWin
Well that's what it looked like on the R console anyway. Do I need to add ";" to get it to be proper R code?!? M <- data.frame(cbind(x[z[1,],], x[z[2,],]) ); nrow(M);[1] 45> head(M) X1 X2 X3 X41 1 11 2 122 1 11 3 133 1 11 4 144 1 11 5 155 1 11 6 166 1 11 7 17
DWin