tags:

views:

66

answers:

2

I have a script where I'm using ddply, as in the following example:

ddply(df, .(col),
function(x) data.frame(
col1=some_function(x$y),
col2=some_other_function(x$y)
)
)

Within ddply, is it possible to reuse col1 without calling the entire function again?

For example:

ddply(df, .(col),
function(x) data.frame(
col1=some_function(x$y),
col2=some_other_function(x$y)
col3=col1*col2
)
)
+1  A: 

I don't think that's possible, but it shouldn't matter too much, because at that point it's not an aggregation function anymore. For example:

#use summarize() in ddply()
data.means <- ddply(data, .(groups), summarize, mean = mean(x), sd = sd(x), n = length(x))
data.means$se <- data.means$sd / sqrt(data.means$n)
data.means$Upper <- data.means$mean + (data.means$SE * 1.96)
data.means$Lower <- data.means$mean - (data.means$SE * 1.96)

So I didn't calculate the SEs directly, but it wasn't so bad calculating it outside of ddply(). If you really wanted to, you could also do

ddply(data, .(groups), summarize, se = sd(x) / sqrt(length(x)))

Or to put it in terms of your example

ddply(df, .(col), summarize,
      col1=some_function(y),
      col2=some_other_function(y)
      col3=some_function(y)*some_other_function(y)
    )
JoFrhwld
Thank you for this example.
Brandon Bertelsen
+1  A: 

You've got a whole function to play with! Doesn't have to be a one-liner! This should work:

ddply(df, .(col), function(x) {
  tmp <- some_other_function(x$y)
  data.frame(
    col1=some_function(x$y),
    col2=tmp,
    col3=tmp
  )
})
Harlan
Thank you, I didn't realize how scalable ddply was. It's my first day actually making use of it. I'm trying to move away from "for" loops. Dirk, pointed the function and the plyr package out to me in another question and I've been making great use of it.
Brandon Bertelsen