tags:

views:

127

answers:

4

I've got a data frame in R, and I'd like to perform a calculation on all pairs of rows. Is there a simpler way to do this than using a nested for loop?

To make this concrete, consider a data frame with ten rows, and I want to calculate the difference of scores between all (45) possible pairs.

> data.frame(ID=1:10,Score=4*10:1)
   ID Score
1   1    40
2   2    36
3   3    32
4   4    28
5   5    24
6   6    20
7   7    16
8   8    12
9   9     8
10 10     4

I know I could do this calculation with a nested for loop, but is there a better (more R-ish) way to do it?

+6  A: 

To calculate the differences, perhaps you could use

outer(df$Score,df$Score,"-")
unutbu
+3  A: 
colmx = matrix(rep(df[,2], 10), ncol=10, byrow=F)
rowmx = matrix(rep(df[,2], 10), ncol=10, byrow=T)
delta = colmx - rowmx
doug
~ubuntu and i get the same answer; 'outer' is a wrapper over the matrix computation i did explicitly, which explains the performance difference between the two--for a 100 x 100 matrix, averaged over 100 trials, the built-in was only about 10% slower--given all the artifacts in measuring this sort of thing, i would say that's within the noise threshold.
doug
+2  A: 

Here another solution using combn:

df <- data.frame(ID=1:10,Score=4*10:1)
cm <- combn(df$ID,2)
delta <- df$Score[cm[1,]]-df$Score[cm[2,]]

or more directly

df <- data.frame(ID=1:10,Score=4*10:1)
delta <- combn(df$ID,2,function(x) df$Score[x[1]]-df$Score[x[2]])
teucer
Ooh, I like the combn function very much.
lorin
+2  A: 

dist() is your friend.

dist(df$Score)

You can put it as a matrix :

as.matrix( dist(df$Score) )
Etiennebr
how did i miss (another) built-in?! Anyway, nice one, +1 from me.
doug