ansaurus

Question

Performances issues when launching an application on a parallel machine

Answer 1

A:

how many threads are you creating and how many cores do you have? Anything more than 2 threads per core will DEGRADE performance significantly.

Usually when approaching a problem such as this you you set up a thread pool (ie 4 threads, max, for a dual core machine) and then you assign each thread a "job". When that job completes then you pop the next job off the job list and the thread carries on processing the new job. You do this until every job has been processed. Bear in mind a thread in the pool can DIRECTLY grab the next job off the job list rather than making a worker thread assign the next job to it.

The work flow is as follows.

1) Main thread creates a number of worker threads.
2) Main thread fills job list.
3) Worker thread checks if there are any more jobs and grabs the next from the list.
4) Goto 3 if there are jobs left.
5) Complete (Or return to 2 depending on your requirements).

There are other approaches to the problem but the one above is probably the most efficient and easy to implement.

Goz 2009-07-13 16:57:13

This doesn't explain why I get the degradation only if I run it on a machine with more than 1 processor/core.

akappa 2009-07-13 18:14:51

None-the-less using a thread pooling system as outlined above is incredibly likely to fix your problems. You didn't answer my question at the start btw ...

Goz 2009-07-14 07:21:16

I've an ancient compiler that doesn't let me utilize most of the threadpool and parallelization techniques that I've found. BTW, with a single-core machine, the thread/core ratio is higher and I don't have the slowdown of 2x that I have with a biprocessor, so the reason should lie elsewhere

akappa 2009-07-14 09:00:35

It probably does lay elsewhere, that i won't deny. It is very strange though. I wonder if the problem lies directly with your ancient compiler making optimisations that assume 2 things cannot happen at the same time? What compiler are you using?

Goz 2009-07-14 09:06:06

Also, have you tried disabling optimisations in your compiler and seeing what happens? It won't necessarily indicate its not a problem with the compiler but if it does fix things then you definitely know its an optimisation problem ...

Goz 2009-07-14 09:07:00

Answer 2

+1 A:

Do your individual Jobs call new as well? new is nearly always thread safe, but usually this safety comes at a huge performance penalty (yes, that paper talks about malloc, but (1) the same issues plague new, and (2) new is often implemented using malloc).

You have a potential memory leak if the call to new Worker(job) fails there is no cleanup code. Of course you may have removed that code in order to post the example or this may be the whole program and you're relying on the operating system to clean up. You may consider using a Scope Guard-like solution for that.

Overall I would recommend looking at something like Intel's Threading Building Blocks or Windows's thread pool. Those should handle a lot of other tricky details (how many threads to create, how to schedule fairly, what data to give them to avoid cache misses, etc.).

Max Lybbert 2009-07-13 18:27:16

Unfortunately I use metrowerks, so I cannot use Intel TBB because it isn't supported.Thank you for your suggestion of use Windows's thread pool, I'm unaware of it (I've tried some thread pool libraries, but they incompatible with MW or they are implemented in a ridiculous way)

akappa 2009-07-13 18:38:36

Answer 3

+1 A:

The Interlocked* functions can sometimes go under the radar when looking for synchronization. They are quite fast, but they do force some synchronization and CPU cache update, which will slow you down. Having that said, one would probably have to use those functions a lot to get the impact you're describing.

Having no further details, I would suggest profiling the worker threads, drilling down into the lengthy parts of the code and eventually identifying the bottleneck. Given the effect is significant, the bottleneck should be noticeable. If you find the place but not the reason, update your question with the actual part of the code causing the problem.

eran 2009-07-13 20:18:39

Answer 4

+1 A:

You may try out putting all the dynamic memory allocation before creating any worker thread. dynamic memory allocation access a heap which requires critical section access. which currently new allocation will be doing. As you are having multiple threads running then those threads will get a lot of time to execute and if those worker threads are doing some dynamic allocation of memory, then your main thread may get a little time to allocate dynamic memory because it will have to wait for other threads to come out of new.

2009-07-14 17:21:34

ansaurus

tags:

views:

answers:

Performances issues when launching an application on a parallel machine

related questions