I have a simple program that breaks a dataset (a CSV file) into 4 chunks, reads each chunk in, does some calculations, and then appends the output together. Think of it as a simple map-reduce operation. Processing a single chunk uses about 1GB of memory. I'm running the program on a quad core PC, with 4GB of ram, running Windows XP. I happen to have coded it up using R, but I don't think it's relevant.
I coded up two versions. One version processes each chunk in sequence. The other version processes chunks two at a time in parallel. Both versions take nearly the same amount of time to finish.
Under what circumstances would you expect to see this performance result?
My current hypothesis is that the processes are bounded by the memory performance, but I don't know the best way to investigate this further. Any suggestions or guesses?
Edit: The program is not IO-bound in terms of the disk. The processing step reads a chunk of a CSV file into memory, churns on it for 5 minutes or so, and then writes the result back out to a file on disk. The file input and output takes a few seconds at most.