ansaurus

Question

If 256 threads give better performance than 8 have I likely got the wrong approach?

Answer 1

+2 A:

Could it be your app is io bound? How is the image data generated?

Evert 2010-01-11 05:25:51

I don't know if it IO bound. How would I detect that?

James Morris 2010-01-11 05:56:57

If the source of your imagedata is read from your harddisk, network, or some other source your threads will be waiting in parallel to read from that source.If the data used to generate the image is 100% computed, or beforehand read into memory there is a good chance it is CPU bound.

Evert 2010-01-11 08:25:37

Mandelbrot sets are entirely calculation-generated.

caf 2010-01-12 01:29:28

Answer 2

+1 A:

A performance improvement gained by allocating more threads than cores suggests that the CPU is not the bottleneck. If I/O access such as disk, memory or even network access are involved your results make perfect sense.

Spencer Ruport 2010-01-11 05:28:34

Answer 3

+1 A:

You are probably benefitting from Simultaneous Multithreading (SMT). Your operating system schedules more threads than cores available, and will swap in and out the threads that are not stalled waiting for resources (such as a memory load). This can very effectively hide the latencies of your memory system from your program and is the technique used to great effect for massive parallelization in CUDA for general purpose GPU programming.

Aron Ahmadia 2010-01-11 05:29:26

Simultaneous Multithreading is the same as hyperthreading used in newer Intel CPUs. The OP stated he's using a dual-core system, and to my knowledge, there is no modern Intel dual-core hyperthreading CPU.

Jay Conrod 2010-01-11 18:44:16

Also, with regard to CUDA, it's possible you're thinking of Single Instruction Multiple Threads (SIMT), which has a similar acronym. I haven't heard of any GPUs using SMT.

Jay Conrod 2010-01-11 18:46:05

Answer 4

+4 A:

Well, probably yes, you're doing something wrong.

However, there are circumstances where 256 threads would run better than 8 without you necessarily having a bad threading model. One must remember that having 8 threads does not mean all 8 threads are actually running all the time. Anytime one thread makes a blocking syscall to the operating system, the thread will stop running and wait for the result. In the meantime, another thread can often do work.

There's this myth that one can't usefully use more threads than contexts on the CPU, but that's just not true. If your threads block on a syscall, it can be critical to have another thread available to do more work. (In practice when threads block there tends to be less work to do, but this is not always the case.)

It's all very dependent on work-load and there's no one right number of threads for any particular application. Generally you never want less threads available than the OS will run, and that's the only true rule. (Unfortunately this can be very hard to find out and so people tend to just fire up as many threads as contexts and then use non-blocking syscalls where possible.)

D.J. Capelis 2010-01-11 10:00:19

Answer 5

+1 A:

If you are seeing a performance increase with the jump to 256 threads, then what you are probably dealing with is a resource bottleneck. At some point, your code is waiting for some slow device (a hard disk or a network connection, for example) in order to continue. With multiple threads, waiting on this slow device isn't a problem because instead of sitting idle and twiddling its electronic thumbs, the CPU can process another thread while the first thread is waiting on the slow device. The more parallel threads that are running, the more work the CPU can do while it is waiting on something else.

If you are seeing performance improve all the way up to 256 threads, I am tempted to say that you have a major performance bottleneck somewhere and it's not the CPU. To test this, try to see if you can measure the idle time of individual threads. I suspect that you will see your threads are stuck in a "blocked" or "waiting" state for a longer portion of their lifetime than they spend in the "running" or "active" state. Some debuggers or function profiling tools will let you do this, and I think there are also Linux tools to do this on the command line.

bta 2010-01-11 17:08:24

ansaurus

tags:

views:

answers:

If 256 threads give better performance than 8 have I likely got the wrong approach?

related questions