ansaurus

Question

R: ddply() Possible to reuse generated columns?

Answer 1

+1 A:

I don't think that's possible, but it shouldn't matter too much, because at that point it's not an aggregation function anymore. For example:

#use summarize() in ddply()
data.means <- ddply(data, .(groups), summarize, mean = mean(x), sd = sd(x), n = length(x))
data.means$se <- data.means$sd / sqrt(data.means$n)
data.means$Upper <- data.means$mean + (data.means$SE * 1.96)
data.means$Lower <- data.means$mean - (data.means$SE * 1.96)

So I didn't calculate the SEs directly, but it wasn't so bad calculating it outside of ddply(). If you really wanted to, you could also do

ddply(data, .(groups), summarize, se = sd(x) / sqrt(length(x)))

Or to put it in terms of your example

ddply(df, .(col), summarize,
      col1=some_function(y),
      col2=some_other_function(y)
      col3=some_function(y)*some_other_function(y)
    )

JoFrhwld 2010-07-30 15:02:12

Thank you for this example.

Brandon Bertelsen 2010-07-30 22:04:35

Answer 2

+1 A:

You've got a whole function to play with! Doesn't have to be a one-liner! This should work:

ddply(df, .(col), function(x) {
  tmp <- some_other_function(x$y)
  data.frame(
    col1=some_function(x$y),
    col2=tmp,
    col3=tmp
  )
})

Harlan 2010-07-30 19:19:31

Thank you, I didn't realize how scalable ddply was. It's my first day actually making use of it. I'm trying to move away from "for" loops. Dirk, pointed the function and the plyr package out to me in another question and I've been making great use of it.

Brandon Bertelsen 2010-07-30 22:05:41

ansaurus

tags:

views:

answers:

R: ddply() Possible to reuse generated columns?

related questions