ansaurus

Question

Answer 1

+2 A:

I took some liberties with your code b/c I try to vectorize vice use loops whenever I can... With the merge function, you merge the two data frames, and operate on the "columns", which allows you to use the vectorization built into R. I think this will do what you want (in the second line I'm just making sure that A and B don't have the same values for height and age so that your distance isn't always zero):

A <- B <- data.frame(date=Sys.Date()-9:0, stock=letters[1:10], type=1:10, height=1:10, age=1:10)
B$height <- B$age <- 10:1
AB <- merge(x=A, y=B, by=c("date", "type"), suffixes=c(".A", ".B"))
height.param <- 1/5000
age.param <- 1
temp <- sqrt( height.param * (AB$height.A - AB$height.B)^2 + age.param * (AB$age.A - AB$age.B)^2 )

richardh 2010-07-14 02:28:35

Thanks. I converted to use vectorization as suggested and it's much quicker than mapply (by a factor of 16). Plus when I'm iterating to find the right parameters, it makes more sense to do the merge once. I had to edit your code a bit and use ddply a the end to group the merged data frame.I'm building an algorithm to select the best stock to buy. This part is looking at the similarity of potential purchases to existing stock. The dates are in there because I'm running against historical data to calibrate the parameters.

alan 2010-07-14 13:34:08

Answer 2

+1 A:

Use mapply, the multivariate form of apply:

res1 <- mapply(GetDistanceTest, Sales$date, Sales$Type, Sales$Height, Sales$Age)

goodside 2010-07-14 02:45:08

FYI, `mapply` is still a loop, just a more compact and readable one.

richardh 2010-07-14 11:33:09

thanks mapply is what I was looking for, but as richardh suggested, vectorization turned out to be faster

alan 2010-07-14 13:33:33

Answer 3

A:

Code as per above comment:

A <- data.frame(date=rep(Sys.Date()-9:0,100), id=letters[1:10], type=floor(runif(1000, 1, 10)), height=runif(1000, 1, 100), age=runif(1000, 1, 100))
B <- data.frame(date=rep(Sys.Date()-9:0,1000), type=floor(runif(10000, 1, 10)), height=runif(10000, 1, 10), age=runif(10000, 1, 10))



AB <- merge(x=A, y=B, by=c("date", "type"), suffixes=c(".A", ".B"))
height.param <- 1
age.param <- 1
AB$ClusterScore <- sqrt( height.param * (AB$height.A - AB$height.B)^2 + age.param * (AB$age.A - AB$age.B)^2 )
Scores <- ddply(AB, c("id"), function(df)sum(df$ClusterScore))

alan 2010-07-14 13:33:00

ansaurus

tags:

views:

answers:

Avoiding Loop with R using Apply (?)

related questions