views:

100

answers:

3

Few years ago, under Windows environment, I did some testing, by letting multiple instance of CPU computation intensive + Memory access intensive + I/O access intensive application run. I develop 2 versions. One is running under multi-process, another is running under multi-thread.

I found the performance is much better for multi-process. I read somewhere else (I cann't remember the site).

Which states that the reasons is due to under multi-thread, they are "fighting" for a single memory pipeline and I/O pipeline, which makes the performance is worst than multi-process

However, I cann't find that article anymore. I was wondering, till today, whether the below still hold true?

Under Windows, having the algorithm code to run under multi-process, high chance that, the performance will be better than multi-thread

+1  A: 

I'm not sure what the quote even means. It's very close to nonsense.

The primary thing that in-proc threads share is virtual memory address space.

Richard Berg
+2  A: 

It depends on how much the various threads or processes (I'll be using the collective term "tasks" for both of them) need to communicate, especially by sharing memory: that's easy, cheap and fast for threads, but not at all for processes, so, if a lot of it is going on, I bet processes' performance is not going to beat threads'.

Also, processes (esp. on Windows) are "heavier" to get started, so if a lot of "task starts" occur, again threads can easily beat processes in terms of performance.

Next, you can have CPUs with "hyperthreading", which can run (at least) two threads on a core very rapidly -- but, not processes (since the "hyperthreaded" threads cannot be using distinct address spaces) -- yet another case in which threads can win performance-wise.

If none of these considerations apply, then the race should be no better than a tie, anyway.

Alex Martelli
A: 

I don't believe this is true. Generally speaking, a multithreaded application will run faster than the same application designed as multiple processes under Windows. There are various reasons why your test might have performed better in the multi-process case.

The first that jumps to mind is false sharing. In your multi-threaded tests, the threads may have been inadvertantly sharing cache lines. This happens happens when different threads access different memory locations that are physically close (within a few bytes). This causes the two CPUs two continuously contend over the same cache line and this severely degrades performance. That can't happen in the multi-process case because the processes have completely separate address spaces.

Peter Ruderman
Um, just because the virtual address spaces are sandboxed doesn't tell you very much about their layout in physical memory. It depends on how big the VirtualAlloc (or whatever) requests were, and whether the processes/threads are doing load-store operations near the boundaries of said blocks.
Richard Berg