views:

368

answers:

12

I'm currently reviewing/refactoring a multithreaded application which is supposed to be multithreaded in order to be able to use all the available cores and theoretically deliver a better / superior performance (superior is the commercial term for better :P)

What are the things I should be aware when programming multithreaded applications?

I mean things that will greatly impact performance, maybe even to the point where you don't gain anything with multithreading at all but lose a lot by design complexity. What are the big red flags for multithreading applications?

Should I start questioning the locks and looking to a lock-free strategy or are there other points more important that should light a warning light?

Edit: The kind of answers I'd like are similar to the answer by Janusz, I want red warnings to look up in code, I know the application doesn't perform as well as it should, I need to know where to start looking, what should worry me and where should I put my efforts. I know it's kind of a general question but I can't post the entire program and if I could choose one section of code then I wouldn't be needing to ask in the first place.

I'm using Delphi 7, although the application will be ported / remake in .NET (c#) for the next year so I'd rather hear comments that are applicable as a general practice, and if they must be specific to either one of those languages

A: 

You should first get a tool to monitor threads specific to your language, framework and IDE. Your own logger might do fine too (Resume Time, Sleep Time + Duration). From there you can check for bad performing threads that don't execute much or are waiting too long for something to happen, you might want to make the event they are waiting for to occur as early as possible.

As you want to use both cores you should check the usage of the cores with a tool that can graph the processor usage on both cores for your application only, or just make sure your computer is as idle as possible.

Besides that you should profile your application just to make sure that the things performed within the threads are efficient, but watch out for premature optimization. No sense to optimize your multiprocessing if the threads themselves are performing bad.

Looking for a lock-free strategy can help a lot, but it is not always possible to get your application to perform in a lock-free way.

TomWij
+3  A: 

More threads then there are cores, typically means that the program is not performing optimally.

So a program which spawns loads of threads usually is not designed in the best fashion. A good example of this practice are the classic Socket examples where every incoming connection got it's own thread to handle of the connection. It is a very non scalable way to do things. The more threads there are, the more time the OS will have to use for context switching between threads.

Toad
Err, as usual, it depends. In certain circumstances (when you expect each thread to have to spend long periods waiting for events), this can have no performance impact and make the code much simpler.
Michael Kohne
In that case, it is much nicer to use an asynchronous architecture and not rely on threads at all.
Toad
+1  A: 

Run-time profilers may not work well with a multi-threaded application. Still, anything that makes a single-threaded application slow will also make a multi-threaded application slow. It may be an idea to run your application as a single-threaded application, and use a profiler, to find out where its performance hotspots (bottlenecks) are.

When it's running as a multi-threaded aplication, you can use the system's performance-monitoring tool to see whether locks are a problem. Assuming that your threads would lock instead of busy-wait, then having 100% CPU for several threads is a sign that locking isn't a problem. Conversely, something that looks like 50% total CPU utilitization on a dual-processor machine is a sign that only one thread is running, and so maybe your locking is a problem that's preventing more than one concurrent thread (when counting the number of CPUs in your machine, beware multi-core and hyperthreading).

Locks aren't only in your code but also in the APIs you use: e.g. the heap manager (whenever you allocate and delete memory), maybe in your logger implementation, maybe in some of the O/S APIs, etc.

Should I start questioning the locks and looking to a lock-free strategy

I always question the locks, but have never used a lock-free strategy; instead my ambition is to use locks where necessary, so that it's always threadsafe but will never deadlock, and to ensure that locks are acquired for a tiny amount of time (e.g. for no more than the amount of time it takes to push or pop a pointer on a thread-safe queue), so that the maximum amount of time that a thread may be blocked is insignificant compared to the time it spends doing useful work.

ChrisW
+4  A: 

One thing that decreases performance is having two threads with much hard drive access. The hard drive would jump from providing data for one thread to the other and both threads would wait for the disk all the time.

Janusz
Its a tradeoff. If the process uses a lot of CPU in relation to disk, it can be a win. It is generally a win in multimedia thumbnail generation. However it is **not** a win when the data source is a CD-ROM. :)
Zan Lynx
+5  A: 

One thing to definitely avoid is lots of write access to the same cache lines from threads.

For example: If you use a counter variable to count the number of items processed by all threads, this will really hurt performance because the CPU cache lines have to synchronize whenever the other CPU writes to the variable.

Zan Lynx
Nice! I didn't know that, I've got a section where I do an interlocked increment (actually: a "lock inc" in assembler) and I've never stopped to think about cache lines. +1, I whish I could give a +2
Jorge Córdoba
+2  A: 

What kills performance is when two or more threads share the same resources. This could be an object that both use, or a file that both use, a network both use or a processor that both use. You cannot avoid these dependencies on shared resources but if possible, try to avoid sharing resources.

Workshop Alex
+3  A: 

Something to keep in mind when locking: lock for as short a time as possible. For example, instead of this:

lock(syncObject)
{
    bool value = askSomeSharedResourceForSomeValue();
    if (value)
        DoSomethingIfTrue();
    else
        DoSomtehingIfFalse();
}

Do this (if possible):

bool value = false;  

lock(syncObject)
{
    value = askSomeSharedResourceForSomeValue();
}  

if (value)
   DoSomethingIfTrue();
else
   DoSomtehingIfFalse();

Of course, this example only works if DoSomethingIfTrue() and DoSomethingIfFalse() don't require synchronization, but it illustrates this point: locking for as short a time as possible, while maybe not always improving your performance, will improve the safety of your code in that it reduces surface area for synchronization problems.

And in certain cases, it will improve performance. Staying locked for long lengths of time means that other threads waiting for access to some resource are going to be waiting longer.

unforgiven3
+1  A: 

You don't mention the language you're using, so I'll make a general statement on locking. Locking is fairly expensive, especially the naive locking that is native to many languages. In many cases you are reading a shared variable (as opposed to writing). Reading is threadsafe as long as it is not taking place simultaneously with a write. However, you still have to lock it down. The most naive form of this locking is to treat the read and the write as the same type of operation, restricting access to the shared variable from other reads as well as writes. A read/writer lock can dramatically improve performance. One writer, infinite readers. On an app I've worked on, I saw a 35% performance improvement when switching to this construct. If you are working in .NET, the correct lock is the ReaderWriterLockSlim.

Steve
And in Java it is java.util.concurrent.locks.ReentrantReadWriteLock
Kathy Van Stone
+2  A: 

You should first be familiar with Amdahl's law.

If you are using Java, I recommend the book Java Concurrency in Practice; however, most of its help is specific to the Java language (Java 5 or later).

In general, reducing the amount of shared memory increases the amount of parallelism possible, and for performance that should be a major consideration.

Threading with GUI's is another thing to be aware of, but it looks like it is not relevant for this particular problem.

Kathy Van Stone
+1  A: 

I recommend looking into running multiple processes rather than multiple threads within the same process, if it is a server application.

The benefit of dividing the work between several processes on one machine is that it is easy to increase the number of servers when more performance is needed than a single server can deliver.

You also reduce the risks involved with complex multithreaded applications where deadlocks, bottlenecks etc reduce the total performance.

There are commercial frameworks that simplifies server software development when it comes to load balancing and distributed queue processing, but developing your own load sharing infrastructure is not that complicated compared with what you will encounter in general in a multi-threaded application.

Ernelli
There's a lot of "problems" to the many process aproach. Firstly the fact that processes do not share the same memory space as threads do, I share a lot of information between threads so going from threads to processes won't be easy at all
Jorge Córdoba
Not sharing memory can be a benefit when it comes to cache coherency in SMP, but with multiple cores it's probably better to share memory such as table lookups, search-trees etc.Still it depends on the application, how it is designed etc. A general rule is that it is very hard to turn a single-threaded application into a multi threaded one without running into problems or insert so many locks that its practically runs as a single threaded app. It has to be designed to be MT from the beginning.
Ernelli
+1  A: 

I'm using Delphi 7

You might be using COM objects, then, explicitly or implicitly; if you are, COM objects have their own complications and restrictions on threading: Processes, Threads, and Apartments.

ChrisW
I'm not, but it's nice to know.
Jorge Córdoba
I haven't used Delphi but I thought that the VCL was implemented using COM: http://en.wikipedia.org/wiki/Visual_Component_Library -- even apart from that, your multi-threaded code and your UI should almost certainly be separate from each other.
ChrisW
A: 

Threads don't equal performance, always.

Things are a lot better in certain operating systems as opposed to others, but if you can have something sleep or relinquish its time until it's signaled...or not start a new process for virtually everything, you're saving yourself from bogging the application down in context switching.

David