views:

512

answers:

4

Hi guys,

I'm posting this question to ask for advice on how to optimize the use of multiple processors from R on a Windows XP machine.

At the moment I'm creating 4 scripts (each script with e.g. for (i in 1:100) and (i in 101:200), etc) which I run in 4 different R sessions at the same time. This seems to use all the available cpu.

I however would like to do this a bit more efficient. One solution could be to use the "doMC" and the "foreach" package but this is not possible in R on a Windows machine.

e.g.

library("foreach")
library("strucchange")
library("doMC") # would this be possible on a windows machine?
registerDoMC(2)  # for a computer with two cores (processors)
## Nile data with one breakpoint: the annual flows drop in 1898
## because the first Ashwan dam was built
data("Nile")
plot(Nile)

## F statistics indicate one breakpoint
fs.nile <- Fstats(Nile ~ 1)
plot(fs.nile)
breakpoints(fs.nile)     # , hpc = "foreach" --> It would be great to test this.
lines(breakpoints(fs.nile))

Any solutions or advice?

Thanks, Jan

+3  A: 

Try the doSNOW parallel backend- it is supported out of the box on Windows. Use it with a snow socket cluster.

Sharpie
Could you please give a simple example code for that? (I remember trying it once and that it didn't work for me) Thanks.
Tal Galili
Please see the answer I just posted.
Dirk Eddelbuettel
+3  A: 

You could try doSMP from REvolution Computing. For more information, see this blog posting: Parallel Multicore Processing with R (on Windows)

rcs
I should we strive to focus on CRAN (or BioC or ...) packages rather than commercial extensions as this gets us on a road towards non-portable code using non-public modules.
Dirk Eddelbuettel
Hi Dirk, In general I agree with you. Specifically in this case - doSMP is GPL and free (and in the post linked to it is said that you could port that library to your own R installation). Cheers, Tal
Tal Galili
Can you please ask REvo to upload it to CRAN, or do so yourself if it is indeed GPL'ed giving you rights to redistribution.
Dirk Eddelbuettel
I agree with Dirk- pursuing the `doSMP` package requires giving personal information to REvolution Computing and then downloading their version of R- which last time I checked was based on 2.9.1 and is now two major revisions out of date.
Sharpie
Hi Dirk, I have no affiliation to REvolution - so me asking them to redistribute is the same as any one else. I won't upload it to CRAN myself since this might cause unneeded friction. I'll e-mail David whom I know simply from his blog, to ask his opinion.
Tal Galili
Update - I uploaded the package to my blog. It is no (easily!) available for download. (That was done after getting permission from REvolution. Not because I legally needed to - but simply because it seemed the nice thing to do)
Tal Galili
+3  A: 

For completeness, here is the requested answer to Tal's comment which provides a simple and portable alternative. The answer consists of running

 > library(snow)
 > help(makeCluster)

and running the first three lines of code from the top of the Examples: section:

> cl <- makeCluster(c("localhost","localhost"), type = "SOCK")
> clusterApply(cl, 1:2, get("+"), 3)
[[1]]
[1] 4

[[2]]
[1] 5

> stopCluster(cl)
> .Platform$OS.type
[1] "windows"
> 

Was that really that hard?

Add-on packages like doSNOW and thereafter foreach can make use of this in a portable way.

Dirk Eddelbuettel
Hi Dirk. Two things:1) I tried the code on my computer and it froze on "makeCluster"2) I remember last I tried it that it didn't work with the two cores I had - but simply ran two R instances on the same one. What do you think ? Tal
Tal Galili
What makes you think the two instances ran on the same core? Unless you enable 'cpu pinning' (typically a rather OS-dependent feature) the OS will allocate jobs as it sees fit. Numerous people, incl the SNOW author, use it this way for stated purpose of 'easy parallel work on Windows'. Also, sockets can be unreliable at larger scale. Real work should be done with MPI. Lastly, as for the machine hanging: no idea. I avoid this OS where I can, which is almost all the time.
Dirk Eddelbuettel
Hi Dirk,Thanks for the answers. Regarding what core is being used - I simply checked it's usage on the windows task manager. But it is from memory, and I might be wrong. Best, Tal
Tal Galili
A: 

Hi guys,

Thanks a lot for all the feedback. I have tried the example above with the following code:

but without any difference in performance (even slower) as measured with system.time() on windows XP os:

library("foreach")
library("strucchange")
data("UKDriverDeaths")
seatbelt <- log10(UKDriverDeaths)
seatbelt <- cbind(seatbelt, lag(seatbelt, k = -1), lag(seatbelt, k = -12))
colnames(seatbelt) <- c("y", "ylag1", "ylag12")
seatbelt <- window(seatbelt, start = c(1970, 1), end = c(1984,12))

# without
system.time(
for (i in 1:10) {
  print(system.time(bp.seat <- breakpoints(y ~ ylag1 + ylag12, data = seatbelt, h = 0.1)))
})
## with SNOW 
library(snow)
cl <- makeCluster(c("localhost","localhost"), type = "SOCK")
clusterApply(cl, 1:2, get("+"), 3)
system.time(
for (i in 1:10) {
  print(system.time(bp.seat <- breakpoints(y ~ ylag1 + ylag12, data = seatbelt, h = 0.1,hpc=c("foreach"))))
}
)
stopCluster(cl)

Any solutions or advice? Should I measure this differently?

thanks and cheers, Jan

Jan
I think you should first use `registerDoSNOW` as states in help to `breakpoints`: "If hpc = "foreach" is to be used, a parallel backend should be registered before".
Marek
I tried it (mean `registerDoSNOW(cl)`) but now I got error "could not find function "recresid", which I supposed is caused by missing of package in cluster node.
Marek