tags:

views:

49

answers:

1

I am currently running numerous apply lines that look like this:

test=data.frame(t=seq(1,5,1),e=seq(6,10,1))
mean(apply(test,2,mean))

I want to convert the second line to mclapply which produces the same result as lapply. I realize that I could extract each item from the lapply statement using a for loop then use mean on that vector but that would slow down performance which I am trying to improve by using mclapply. The problem is both lapply and mcapply return a list which mean cannot use. I can either use [[]] to get the actual value or test$t and test$e but the number of columns in test is variable and typically runs over 1,000. There must be an easier way to handle this. Basically I want to get the mean of this statement:

mclapply(test,mean,mc.preschedule=TRUE)

preferably without generating new variables or using for loops. The solution should be equivalent to getting the mean of this statement:

lapply(test,mean)
+1  A: 

I'm confused -- a data.frame is after all list as well. So besides the obvious

R> testdf <- data.frame(t=seq(1,5,1),e=seq(6,10,1))
R> mean(testdf)
t e 
3 8 
R> mean(mean(testdf))
[1] 5.5
R> 

you could also do

R> lapply(testdf, mean)
$t
[1] 3

$e
[1] 8

R> mean(unlist(lapply(testdf, mean)))
[1] 5.5
R> 

So there for the inner lapply you could use mclapply as desired, no?

Dirk Eddelbuettel
The purpose of using mclapply would be to turn a 6 hour simulation to a 3 hour simulation so the mean(mean(test)) while elegant does not speed up the simulation. The unlist solution is precisely what I need! Thanks so much! Now I can just substitute mclapply for lapply and cut my simulation time is half!
ProbablePattern
"Now I can just substitute mclapply for lapply and cut my simulation time is half!" Maybe. Remember there are fixed costs to parallelizing something; threads need to be initiated, etc.
Vince
Yes, the `mean(mean(testdf))` was merely to establish the overall mean which you had not shown. I understand it was a stylized example. Glad to have been of help.
Dirk Eddelbuettel
True, true. Burst my bubble why don't you:) I do understand that it doesn't work precisely like that but 4 cores should be substantially faster than 1 core on a 6 hour simulation.
ProbablePattern
It all depends. For some things you may get near-linear speed-ups, for others you will not. That's what follow-up questions are for :)
Dirk Eddelbuettel