views:

60

answers:

1

Hi,

I have a nested loop that I'm using foreach, DoSNOW, and a SNOW socket cluster to solve for. How should I go about profiling the code to make sure I'm not doing something grossly inefficient.

Also is there anyway to measure the data flows going between the master and nodes in a Snow cluster?

Thanks,

James

+2  A: 

That is an excellent question. From the top of my head, start with a comparison between

  • a serial solution (no snow),
  • a serial solution with snow (to get an idea of overhead) and
  • a parallel solution maybe controlling N to see what type of increase you get.

The never-released-on-CRAN version 0.3.4 of snow also has additional plotting commands that are useful for analysis. You can get it from this directory at Luke Tierney's site.

Real profiling, of course, is hard given the distributed nature.

Dirk Eddelbuettel
Hi, thanks for the tip. I tried the increase controlling for N and got a fairly linear increase (since the problem I'm working on is essentially a resampling problem and hence pretty parrallelizable).Is there a way to measure the dataflow between nodes? I get a feeling this is the bottleneck.
James
Not with R, I fear, but I could be missing something. You could try standard networking tools.
Dirk Eddelbuettel

related questions