ansaurus

Question

Answer 1

+24 A:

In a multithreading environment you have to take care of synchronization so two threads doesn't clobber the state by simultaneously performing modifications. Otherwise you can have race conditions in your code (for an example see the infamous Therac-25 accident.) You also have to schedule the threads to perform various tasks. You then have to make sure that your synchronization and scheduling doesn't cause a deadlock where multiple threads will wait for each other indefinitely.

Synchronization

Something as simple as increasing a counter requires synchronization:

counter += 1;

Assume this sequence of events:

counter is initialized to 0
thread A retrieves counter from memory to cpu (0)
context switch
thread B retrieves counter from memory to cpu (0)
thread B increases counter on cpu
thread B writes back counter from cpu to memory (1)
context switch
thread A increases counter on cpu
thread A writes back counter from cpu to memory (1)

At this point the counter is 1, but both threads did try to increase it. Access to the counter has to be synchronized by some kind of locking mechanism:

lock (myLock) {
  counter += 1;
}

Only one thread is allowed to execute the code inside the locked block. Two threads executing this code might result in this sequence of events:

counter is initialized to 0
thread A acquires myLock
context switch
thread B tries to acquire myLock but has to wait
context switch
thread A retrieves counter from memory to cpu (0)
thread A increases counter on cpu
thread A writes back counter from cpu to memory (1)
thread A releases myLock
context switch
thread B acquires myLock
thread B retrieves counter from memory to cpu (1)
thread B increases counter on cpu
thread B writes back counter from cpu to memory (2)
thread B releases myLock

At this point counter is 2.

Scheduling

Scheduling is another form of synchronization and you have to you use thread synchronization mechanisms like events, semaphores, message passing etc. to start and stop threads. Here is a simplified example in C#:

AutoResetEvent taskEvent = new AutoResetEvent(false);

Task task;

// Called by the main thread.
public void StartTask(Task task) {
  this.task = task;
  // Signal the worker thread to perform the task.
  this.taskEvent.Set();
  // Return and let the task execute on another thread.
}

// Called by the worker thread.
void ThreadProc() {
  while (true) {
    // Wait for the event to become signaled.
    this.taskEvent.WaitOne();
    // Perform the task.
  }
}

You will notice that access to this.task probably isn't synchronized correctly, that the worker thread isn't able to return results back to the main thread, and that there is no way to signal the worker thread to terminate. All this can be corrected in a more elaborate example.

Deadlock

A common example of deadlock is when you have two locks and you are not careful how you acquire them. At one point you acquire lock1 before lock2:

public void f() {
  lock (lock1) {
    lock (lock2) {
      // Do something
    }
  }
}

At another point you acquire lock2 before lock1:

public void g() {
  lock (lock2) {
    lock (lock1) {
      // Do something else
    }
  }
}

Let's see how this might deadlock:

thread A calls f
thread A acquires lock1
context switch
thread B calls g
thread B acquires lock2
thread B tries to acquire lock1 but has to wait
context switch
thread A tries to acquire lock2 but has to wait
context switch

At this point thread A and B are waiting for each other and are deadlocked.

Martin Liversage 2009-08-24 10:40:53

Answer 2

A:

In .Net one thing that surprised me when I started trying to get into multi-threading is that you cannot straightforwardly update the UI controls from any thread other than the thread that the UI controls were created on.

There is a way around this, which is to use the Control.Invoke method to update the control on the other thread, but it is not 100% obvious the first time around!

Calanus 2009-08-24 10:45:45

something that is not limited to .NET, a lot of GUI toolkits are 'single threaded'.

gbjbaanb 2009-08-24 10:48:54

Answer 3

+2 A:

YAGNI

The most important thing to remember is: do you really need multithreading?

P.K 2009-08-24 10:46:26

He might want to study the technique for personal growth, tho.

Gert 2009-08-24 12:57:31

@Gert I agree that he might want to study the technique for personal growth. In that case too, this principle holds good. There should be a justification for writing multithreaded code.

P.K 2009-08-24 16:15:57

In any professional shop, the answer is likely going to be 'yes'. If not right now, then at some point down the line. Not knowing this topic is going to hold you back eventually.

Casey 2009-09-21 23:09:46

Answer 4

+1 A:

An important thing to take care of (with multiple cores and CPUs) is cache coherency.

P.K 2009-08-24 10:48:01

I am not getting this one. Would you please elaborate a little bit?

TheMachineCharmer 2009-08-24 11:10:14

Take a look these links:http://www.shafqatahmed.com/2008/01/multi-core-cach.htmlhttp://en.wikipedia.org/wiki/CPU_cache

P.K 2009-08-24 11:27:02

If this is true, then it is a hardware design flaw. The CPU cache design should be transparent to the software. Volatile should only be necessary when the memory is change by non-CPU hardware.

Casey 2009-09-21 22:58:25

Answer 5

+8 A:

I can't give you examples besides pointing you at Google. Search for threading basics, thread synchronisation and you'll get more hits than you know.

The basic problem with threading is that threads don't know about each other - so they will happily tread on each others toes, like 2 people trying to get through 1 door, sometimes they will pass though one after the other, but sometimes they will both try to get through at the same time and will get stuck. This is difficult to reproduce, difficult to debug, and sometimes causes problems. If you have threads and see "random" failures, this is probably the problem.

So care needs to be taken with shared resources. If you and your friend want a coffee, but there's only 1 spoon you cannot both use it at the same time, one of you will have to wait for the other. The technique used to 'synchronise' this access to the shared spoon is locking. You make sure you get a lock on the shared resource before you use it, and let go of it afterwards. If someone else has the lock, you wait until they release it.

Next problem comes with those locks, sometimes you can have a program that is complex, so much that you get a lock, do something else then access another resource and try to get a lock for that - but some other thread has that 2nd resource, so you sit and wait... but if that 2nd thread is waiting for the lock you hold for the 1st resource.. it's going to sit and wait. And your app just sits there. This is called deadlock, 2 threads both waiting for each other.

Those 2 are the vast majority of thread issues. The answer is generally to lock for as short a time as possible, and only hold 1 lock at a time.

gbjbaanb 2009-08-24 10:48:07

(+1) for awesome writing style. You made the point clear.

TheMachineCharmer 2009-08-24 11:50:26

And if you have to hold more than one lock at a time, always acquire the locks in the same order.

FogleBird 2009-08-24 23:50:28

Answer 6

+1 A:

I am applying my new found knowledge of threading everywhere

[Emphasis added]

DO remember that a little knowledge is dangerous. Knowing the threading API of your platform is the easy bit. Knowing why and when you need to use synchronisation is the hard part. Reading up on "deadlocks", "race-conditions", "priority inversion" will start you in understanding why.

The details of when to use synchronisation are both simple (shared data needs synchronisation) and complex (atomic data types used in the right way don't need synchronisation, which data is really shared): a lifetime of learning and very solution specific.

Richard 2009-08-24 10:54:27

Answer 7

+6 A:

I notice you are writing in java and that nobody else mentioned books so Java Concurrency In Practice should be your multi-threaded bible.

Erik 2009-08-24 10:55:24

Thanks. But question is not really java specific. I hope that practices in that book applies in general to codes written in most if not all programming languages.

TheMachineCharmer 2009-08-24 11:08:02

Answer 8

+2 A:

Don't start new threads unless you really need to. Starting threads is not cheap and for short running tasks starting the thread may actually take more time than executing the task itself. If you're on .NET take a look at the built in thread pool, which is useful in a lot of (but not all) cases. By reusing the threads the cost of starting threads is reduced.

EDIT: A few notes on creating threads vs. using thread pool (.NET specific)

Generally try to use the thread pool. Exceptions:

Long running CPU bound tasks and blocking tasks are not ideal run on the thread pool cause they will force the pool to create additional threads.
All thread pool threads are background threads, so if you need your thread to be foreground, you have to start it yourself.
If you need a thread with different priority.
If your thread needs more (or less) than the standard 1 MB stack space.
If you need to be able to control the life time of the thread.
If you need different behavior for creating threads than that offered by the thread pool (e.g. the pool will throttle creating of new threads, which may or may not be what you want).

There are probably more exceptions and I am not claiming that this is the definitive answer. It is just what I could think of atm.

Brian Rasmussen 2009-08-24 11:07:22

(+1)How would I know if I really need a thread for particular task or not? Of course I will come to know about this with experience. But still, is there some trick or rule-of-thumb?

TheMachineCharmer 2009-08-24 11:37:44

Answer 9

+13 A:

There are two kinds of people that do not use multi threading.

1) Those that do not understand the concept and have no clue how to program it. 2) Those that completely understand the concept and know how difficult it is to get it right.

Henrico Dolfing 2009-08-24 11:25:52

Excellent answer.

gimpf 2009-08-24 11:29:50

And those that understand how difficult multi-threading is and therefore are using something else to achieve concurrency :) (Communicating Sequencial Processes, Dataflow Variables, languages like Erlang, Mozart Oz, etc.)

Thomas Danecker 2009-08-25 09:29:49

Answer 10

+11 A:

I'd make a very blatant statement:

DON'T use shared memory.

DO use message passing.

As a general advice, try to limit the amount of shared state and prefer more event-driven architectures.

Thomas Danecker 2009-08-24 11:38:45

Answer 11

+3 A:

I agree with pretty much all the answers so far.

A good coding strategy is to minimise or eliminate the amount of data that is shared between threads as much as humanly possible. You can do this by:

Using thread-static variables (although don't go overboard on this, it will eat more memory per thread, depending on your O/S).
Packaging up all state used by each thread into a class, then guaranteeing that each thread gets exactly one state class instance to itself. Think of this as "roll your own thread-static", but with more control over the process.
Marshalling data by value between threads instead of sharing the same data. Either make your data transfer classes immutable, or guarantee that all cross-thread calls are synchronous, or both.

Try not to have multiple threads competing for the exact same I/O "resource", whether it's a disk file, a database table, a web service call, or whatever. This will cause contention as multiple threads fight over the same resource.

Here's an extremely contrived OTT example. In a real app you would cap the number of threads to reduce scheduling overhead:

All UI - one thread.
Background calcs - one thread.
Logging errors to a disk file - one thread.
Calling a web service - one thread per unique physical host.
Querying the database - one thread per independent group of tables that need updating.

Rather than guessing how to do divvy up the tasks, profile your app and isolate those bits that are (a) very slow, and (b) could be done asynchronously. Those are good candidates for a separate thread.

And here's what you should avoid:

Calcs, database hits, service calls, etc - all in one thread, but spun up multiple times "to improve performance".

Christian Hayter 2009-08-24 11:46:36

Or use a framework like .net's Task Parallel Library, create a lot of little tasks and let the runtime system decide which to execute in parallel. (No shared state is even more important in this scenario)

Thomas Danecker 2009-08-25 09:33:46

Answer 12

+3 A:

DONT use global variables

DONT use many locks (at best none at all - though practically impossible)

DONT try to be a hero, implementing sophisticated difficult MT protocols

DO use simple paradigms. I.e share the processing of an array to n slices of the same size - where n should be equal to the number of processors

DO test your code on different machines (using one, two, many processors)

DO use atomic operations (such as InterlockedIncrement() and the like)

RED SOFT ADAIR 2009-08-24 12:21:19

The last one should also be a DON'T like the first one. Interlocked operations do not scale that well either (because of various, very bad caching effects and other cpu-synchronization requirements). I'd still prefer locks over interlocked operations, but they may be a last resort when profiling shows a problem with the locks and you can't do something else (like less sharing).

Thomas Danecker 2009-08-25 09:39:28

You are right considering caching and performance. Good Point. But atomic operations are threadsafe and lockfree by nature. They will not introduce bugs.

RED SOFT ADAIR 2009-08-25 13:09:38

Answer 13

A:

Hi

While your initial differences in sums of numbers are, as several respondents have pointed out, likely to be the result of lack of synchronisation, if you get deeper into the topic, be aware that, in general, you will not be able to reproduce exactly the numeric results you get on a serial program with those from a parallel version of the same program. Floating-point arithmetic is not strictly commutative, associative, or distributive; heck, it's not even closed.

And I'd beg to differ with what, I think, is the majority opinion here. If you are writing multi-threaded programs for a desktop with one or more multi-core CPUs, then you are working on a shared-memory computer and should tackle shared-memory programming. Java has all the features to do this.

Without knowing a lot more about the type of problem you are tackling, I'd hesitate to write that 'you should do this' or 'you should not do that'.

Regards

Mark

High Performance Mark 2009-08-24 12:21:53

digressing from the original question: Floating-point arithmetic is not closed? Why not? (aren't Inf and NaN IEEE-754 floating point numbers?)

levinalex 2009-08-24 12:34:11

Answer 14

+3 A:

-- What are some known thread issues? --

-- What care should be taken while using threads? --

Using multi-threading on a single-processor machine to process multiple tasks where each task takes approximately the same time isn’t always very effective.For example, you might decide to spawn ten threads within your program in order to process ten separate tasks. If each task takes approximately 1 minute to process, and you use ten threads to do this processing, you won’t have access to any of the task results for the whole 10 minutes. If instead you processed the same tasks using just a single thread, you would see the first result in 1 minute, the next result 1 minute later, and so on. If you can make use of each result without having to rely on all of the results being ready simultaneously, the single thread might be the better way of implementing the program.

If you launch a large number of threads within a process, the overhead of thread housekeeping and context switching can become significant. The processor will spend considerable time in switching between threads, and many of the threads won’t be able to make progress. In addition, a single process with a large number of threads means that threads in other processes will be scheduled less frequently and won’t receive a reasonable share of processor time.

If multiple threads have to share many of the same resources, you’re unlikely to see performance benefits from multi-threading your application. Many developers see multi-threading as some sort of magic wand that gives automatic performance benefits. Unfortunately multi-threading isn’t the magic wand that it’s sometimes perceived to be. If you’re using multi-threading for performance reasons, you should measure your application’s performance very closely in several different situations, rather than just relying on some non-existent magic.

Coordinating thread access to common data can be a big performance killer. Achieving good performance with multiple threads isn’t easy when using a coarse locking plan, because this leads to low concurrency and threads waiting for access. Alternatively, a fine-grained locking strategy increases the complexity and can also slow down performance unless you perform some sophisticated tuning.

Using multiple threads to exploit a machine with multiple processors sounds like a good idea in theory, but in practice you need to be careful. To gain any significant performance benefits, you might need to get to grips with thread balancing.

-- Please provide examples. --

For example, imagine an application that receives incoming price information from the network, aggregates and sorts that information, and then displays the results on the screen for the end user.

With a dual-core machine, it makes sense to split the task into, say, three threads. The first thread deals with storing the incoming price information, the second thread processes the prices, and the final thread handles the display of the results.

After implementing this solution, suppose you find that the price processing is by far the longest stage, so you decide to rewrite that thread’s code to improve its performance by a factor of three. Unfortunately, this performance benefit in a single thread may not be reflected across your whole application. This is because the other two threads may not be able to keep pace with the improved thread. If the user interface thread is unable to keep up with the faster flow of processed information, the other threads now have to wait around for the new bottleneck in the system.

And yes, this example comes directly from my own experience :-)

RoadWarrior 2009-08-24 12:24:47

Answer 15

+1 A:

I am surprised that no one has pointed out Herb Sutter's Effective Concurrency columns yet. In my opinion, this is a must read if you want to go anywhere near threads.

cmeerw 2009-08-24 20:54:56

Answer 16

A:

Don't be fooled into thinking you understand the difficulties of concurrency until you've split your head into a real project.

All the examples of deadlocks, livelocks, synchronization, etc, seem simple, and they are. But they will mislead you, because the "difficulty" in implementing concurrency that everyone is talking about is when it is used in a real project, where you don't control everything.

Juice 2009-08-27 22:27:28

Answer 17

+1 A:

a) Always make only 1 thread responsible for a resource's lifetime. That way thread A won't delete a resource thread B needs - if B has ownership of the resource

b) Expect the unexpected

Maciek 2009-08-27 22:32:17

Answer 18

+1 A:

DO think about how you will test your code and set aside plenty of time for this. Unit tests become more complicated. You may not be able to manually test your code - at least not reliably.

DO think about thread lifetime and how threads will exit. Don't kill threads. Provide a mechanism so that they exit gracefully.

DO add some kind of debug logging to your code - so that you can see that your threads are behaving correctly both in development and in production when things break down.

DO use a good library for handling threading rather than rolling your own solution (if you can). E.g. java.util.concurrency

DON'T assume a shared resource is thread safe.

DON'T DO IT. E.g. use an application container that can take care of threading issues for you. Use messaging.

Conor 2009-08-28 14:07:54

ansaurus

tags:

views:

answers:

What are multi-threading DOs and DONTs?

related questions