This may sound like a silly question, but does your processor have more than one core? It was my understanding P4's didn't, but I have as much knowledge about hardware as a fish does astrophysics.
When you say your "process is only half utilized", do you mean that you are monitoring two cores and only one is being used, or a single core is being half used? If it's the latter, your application is probably memory bound (and probably hitting swap space), not CPU, so parallelization won't help.
Also, it doesn't look like the plyr
package uses the multicore
package, so you would have to explicitly rewrite parts of plyr
to get parallelization. But, if parts of plyr were embarrassingly parallel, I bet they'd already be parallelized.
So I don't think your problem is CPU bound, I think it's memory-bound (and hitting swap). Monitor your memory, and maybe move it to a higher memory machine.
Hope this helps!
Edit:
@Vince As I wrote on romunov's answer;
HT core will execute 2 processes
faster than one (yet slower than 2
cores), so it is worth making
parallel. Also even memory bound
process will also take 100% of core.
(my emphasis)
Worth making parallel? There's much more that goes into that equation. Countless times when exploring Python's multiprocessing and threading modules I've rewritten entire programs - even "easily parallelizable" ones - and they've run slower. Why? There are fixed costs to opening new threads, processes, shuffling data around to different processes, etc. It's just not that simple; parallelization, in my experience, has never been the magic bullet it's being talked about here. I think these answers are misleading.
First off, we're talking about parallelizing a task that takes "6-7 minutes". Unless the OP knows his/her data is going to grow a lot, parallelization isn't even worth the wall clock time it takes to program. In the time it takes to implement the parallel version, perhaps he/she could have done 100 non-parallel runs. In my work environment, that wall clock time matters. These calculations need to be factored in to the runtime equation (unless you're doing it for learning/fun)
Second, if it is hitting swap space, the largest slow down isn't the CPU, it's disk I/O. Even if there was an easy way to shuffle around plyr code to get some parts parallelized (which I doubt), doing so on an I/O-bound process would speed things up trivially compared to adding more memory.
As an example, I once ran a command from the reshape package that demonstrated this exact behavior. It was on a multicore OS X machine with 4GB of memory, and in seconds it was crawling (well, my whole computer was crawling!) with 60-70% CPU across two cores, and all 4GB of memory used. I let it run as an experiment for an hour, then killed R and saw my memory jump back to 3GB free. I shuffled it to a 512GB RAM server (yes, we are lucky enough to have that), and it finished in 7 minutes. No amount of core usage changed.