ansaurus

Question

idata.frame: Why error "is.data.frame(df) is not TRUE"?

Answer 1

A:

Strange behaviour, but even in the docs it says that idata.frame is experimental. You probably found a bug. Perhaps you could rewrite the check at the top of ddply that tests is.data.frame().

In any case, this cuts about 20% off the time (on my system):

system.time(df.median<-ddply(exp, .(groupname,starttime,fPhase,fCycle), function(x) data.frame(
inadist=median(x$inadist),
smldist=median(x$smldist),
lardist=median(x$lardist),
inadur=median(x$inadur),
smldur=median(x$smldur),
lardur=median(x$lardur),
emptyct=median(x$emptyct),
entct=median(x$entct),
inact=median(x$inact),
smlct=median(x$smlct),
larct=median(x$larct),
na.rm=TRUE))
)

Shane asked you in another post if you could cache the results of your script. I don't really have an idea of your workflow, but it may be best to setup a chron to run this and store the results, daily/hourly whatever.

Brandon Bertelsen 2010-10-21 07:03:55

that's interesting that specifying the columns speeds things up (it does so on my system too), but so far the fastest solution is to use `aggregate()`. In any event, the reason I used my old slow call in this question was because it caused `idata.frame()` to choke. I have a lot of calls that use the exp data frame, and I thought if I could substitute an idata.frame, it might speed up all the calls significantly.

dnagirl 2010-10-21 12:38:32

You can use `idata.frame` with this call - but it only gives about a 10% speed up because you're not making that many splits.

hadley 2010-10-21 15:27:17

And yes, `aggregate` will currently beat `ddply` + `colwise`. Thinking about how to do better for the next version.

hadley 2010-10-21 15:29:15

ansaurus

tags:

views:

answers:

idata.frame: Why error "is.data.frame(df) is not TRUE"?

related questions