views:

996

answers:

15

Recently, I found myself having to write up some concerns I have about race conditions in an application that is in development (not by me). This will likely be brought to the attention of stakeholders who are non-technical and with whom I do not have a direct line of communication, so my explanation needs to be in written form.

I have already made an attempt at this write-up. I gloss over the technical specifics as best I can, give an example of how a race condition would occur in the application, and describe its impact. I feel I did pretty well, but it's far from perfect.

The problem is, as much as I try to shield the reader from computer science, I have still found it difficult to eliminate phrases like "threads of execution" and "mutual exclusion" without losing correctness and substance. The risk is that, with too much hand-waving, these concerns could be dismissed as a made-up boogeyman.

Anyway, my question to you is this: How would you explain race conditions to a non-technical audience? Would you dare to explain CPU scheduling? Would you invoke the dining philosophers?

You don't have to work within the constraints of my situation (but it would be awesomely helpful if you did).

+2  A: 

I would go for a Dining Philosopher's-esque approach, but depending on my audience, I would try to analogize it to the context of my audience. Are you speaking to business executives? Then analogize it to something like allocate a meeting room or a corporate car or booking a hotel room or whatever. Are you talking to average people? Then the dining philosopher's example is fine, or you can think up a similar situation involving caring for farm animals or sitting in chairs or whatever.

Whether you hijack the dining philosopher's example, or make up your own, definitely use a metaphor.

HanClinto
I think the dining philosophers is too abstract. It explains how race conditions can occur, but not how it relates to these people, or why it can be bad. They can just say "Yeah, but we're not making a restaurant for philosophers, so why do we care?"
jalf
Not so good because DP has the "acquire 2 resources" aspect; it is more about algorithm design than about fundamentals. This is more about deadlock management than about timing.
S.Lott
+16  A: 

Company X has $1,000 in the bank. X pays a rent of $2,000 and received a payment of $10,000 for services rendered to company Y. However, due to a race condition, X is in deficit of $1,000 and is now applying for bankruptcy. =(

You might want to explain how the bank handles company X's account in this way: Bank staff A takes the current value of $1,000 and adds $10,000 to it. Bank staff B takes the current value of $1,000 and subtracts $2,000 from it. Bank staff A updates the value to $11,000. Bank staff B updates the value to -$1,000.

blizpasta
I like this one. For those that really don't get it, you could easily adapt this to a "game" where person 1 is banker a, person 2 is banker b, they know the balance, and what to do with it, but not what the other is doing. At the end, just ask for who wants to go first..
shelfoo
+1 for explaining it to me!
discorax
Props: provide a ledger and two documents with the transactions face down on the table. Have two participants grab a transaction and fight over the ledger book.
S.Lott
You may like to clarify that the bank allows you to go in the red as long as you settle up at the end of the day. (Otherwise, even if A blocks waiting for B to finish, there can still be a negative balance at some point and bankruptcy is triggered.)
Oddthinking
@Oddthinking: That's one way that a bank can handle the race condition. Another way is that banks resolve the transactions in order by type not arrival -- all deposits in a given day are done before all withdrawls.
S.Lott
It doesn't really matter how the bank handles the race conditions though, does it? The important thing is that if you *don't* handle them, bad things happen.
jalf
+1  A: 

How about the plain obvious?

A race condition is literally a race between two people.

A company is bidding on a project. Two employees working independently on bids submit them to the customer, but one of the employees has outdated information. Neither employee know that the other is in the process of submitting a bid, therefore depending on who is faster, the first bid may be replaced with the slower employee. This will cause confusion as the bid may have changed over time.

There needs to be communication between the two employees to either work together or stop one of them.

Pyrolistical
+2  A: 

If you are writing to a non technical audience, you'll want to simplify your explanations and relate it to something they can understand. One explanation taken from the paper Analogies for teaching parallel computing to inexperienced programmers (http://portal.acm.org/citation.cfm?doid=1189136.1189172) explains it in terms of a pen game:

We’re going to play a game called the Pen Game. The rules are simple: I’m going to hold a pen in my hand, and then I’ll say “One, two, three, go.” When I say “go,” take the pen from my hand. Whoever gets the pen wins. Ready? One, two, three, go.

You then ask if the outcome of this game can be predicted in advance. If it can't be predicted, can we guarantee a correct outcome? This should lead to the realization that it's possible to get incorrect results for simultaneous writes to the same piece of memory.

mweiss
This kinda falls down in that it doesn't explain why a race condition is bad. Does it matter who ends up with the pen? Are there "incorrect outcomes" at all?
jalf
The idea is to show that things don't happen in a way you can expect, and even a business person should be able to understand why that would be bad for an application
mweiss
How about this wrinkle? Give everyone 2 donuts. Don't get the pen, give up a donut. As soon as someone starves, ask what policy would prevent starvation.
S.Lott
Or, keep the pen analogy, but add the details that one of the people needs to write a document, another sign that document. The correct outcome is the person writing the document getting the pen, then the one who signs. The solution to suggest at the end is having them coordinate with each other.
rmeador
+1 for sequencing. If the document is the order for donuts, you can still hit the Live-lock/starvation as a consequence of race conditions.
S.Lott
+2  A: 

I was going to recommend the dining philosophers, but I see you have already found that one. So, as an alternative, how about using gridlock as an analogy?

Imagine normal traffic driving along the four streets next to a single city block (North ave, South ave, East street and West street). When there are only one or two cars on the road, everything moves smoothly. When there is steady traffic, some cars will have to stop and wait for other cars to move past, but this is a manageable problem. One car stops to wait for another car to go by, and then continues on its merry way.

Now, picture rush-hour traffic at the same location. Let's say that one car driving South on West street can't make it all the way through the intersection at the NorthWest corner of our city block. That car now blocks all of the Westbound cross traffic on North ave. It doesn't take long before a Westbound car tries to make it through the NorthEast corner intersection and gets stuck, blocking all of the Northbound traffic on East street. When this situation makes it all the way around the four intersections, no cars can move! Each one is waiting for the cars in front of it to move ahead, but there is no way for the gridlock to be releived without pulling cars out backwards.

The comparison to computing should be straightforward. Cars are threads or processes, streets and avenues are processors, buffers, or cores. The concept of blocking can be described using traffic lights or stop signs, and the whole thing starts to make intuitive sense, even to non-programmers.

e.James
I think that this does not convey the problem of race conditions. You would have to show a situation where it matters which car arrives first at some point.
Svante
-1: Not clear what resource everyone is competing for. Are they trying to seize (acquire? lock?) the intersections?
S.Lott
@S.Lott: That is fair criticism. I would say that they are competing for use of the roads, analagous to threads competing for the use of cores on a multi-core processor, but I agree that the connection is not as strong as it could be.
e.James
@eJames: make a rewrite. Or possibly delete it and put it another.
S.Lott
+9  A: 

I think bank transactions might be a good example, both because it's easy to see that an incorrect result is bad and because race conditions are easy to create in such an environment.

I have $500 on my account. Someone transfers $200 to me at the same time that I withdraw $50.

Now, if the bank doesn't handle race conditions properly, they will do the following (assuming the transactions are handled manually, of course) Clerk A will see the request to add $200 to my balance, and note that my balance is currently $500. Clerk B will see the request to subtract $50 from my balance, and note that my balance is currently $500 (clerk A hasn't yet transferred the money).

Clerk A finishes the paperwork and sets my account balance to $700 (500 + the 200 he was supposed to add). And then, a minute later (because clerk B just had to grab a cup of coffee), clerk B finishes up the other transaction and sets my balance to $450 (the 500 I had when he checked, minus the 50 he was meant to subtract).

My balance is now $450, when it should have been $650, because of a race condition. The outcome depended on the order in which different parts of the two transactions were performed.

That's the general description of how race conditions are bad. Now say that instead of clerks, we have our application processing two separate tasks at the same time (that's your 'threads of execution'), and just like above, they both read a value, modify the value that they read, and then write it back. One of the modifications may then be lost if this happens in the order shown above. That should relate it to the specific problems in your app.

jalf
Bank transactions are a good way to frame an example. I'd suggest that it would be better to use two identical transactions, because then it's obvious that the problem is in the timing.
Mark Bessey
I'm not sure I see how that would be more obvious?
jalf
A: 

There's a great example in Structured Concurrent Programming With Operating Systems Applications (as I recall)

In the impoverished country of Bezerkistan, two lines merge onto a single track in a tunnel. There have been collisions and the ruling junta needs a solution.

The issue is that it's mountainous and the engineers are blind. There's very little advance warning of two trains about to collide in the tunnel.

Here's the plan.

  1. Put a big bowl at the juncture.

  2. Give each engineer a little brass monkey.

When you're about to enter the tunnel, you stop your train. You pat around in the bowl to see if a brass monkey is in the bowl.

If there's a monkey, someone else is using the tunnel, so you have to wait until their train is entirely in the tunnel, at which time the conductor gets out of the caboose and grabs the monkey from the bowl.

If there's no monkey, no one else is using the tunnel. So, you can grab your monkey from the engine compartment, put it in the bowl and drive through the tunnel, knowing you have acquired exclusive access to the track. Of course, you stop briefly to allow the conductor to retrieve the brass monkey.

Guess what?

They still had collisions!

Why? What's the situation or sequence of actions that causes this to fail?


That's a race condition.

In a written document, you can explain how the race condition leads to an accident.

In a presentation, you can coach the audience through reasoning about concurrency and locking.

S.Lott
This is a solution to the race condition, but doesn't really explain what is a "race"
Pyrolistical
Thanks. You comment means my presentation wasn't clear enough. The bowl-and-brass-monkey replaces one race condition with another. It's specifically not a solution. How's this revised version?
S.Lott
A train metaphore is a great analogy because semaphores (a locking/signaling technique in code) gets its name from the railway system
bryanbcook
Yep. That's one reason why the book is so good.
S.Lott
A: 

I think it's hard to explain this in a simple way, because thinking about concurrency is inherently hard. The basic idea of a financial transaction might be a good place to start, since people will have some familiarity with them from real life.

In any kind of transaction, you need to make simultaneous entries in two places - debits and credits. If the transaction gets interrupted in the middle by someone else trying to perform another transaction, they will see the wrong balance in one or the other of the accounts.

Mark Bessey
+2  A: 

Write a program:

  1. Wait for salary.
  2. Go to shop.
  3. Buy food.
  4. Turn on the plate.
  5. Put food on the plate.
  6. Keep plate for 20 minutes.
  7. Eat.
  8. Go to bed.

Now try to have two threads (you, wife) execute it without syncronization.

  • You: Wait for salary.
  • Wife: Go to the shop without money, crash

  • You: Turn on the plate.

  • You: Keep plate for 20 minutes.
  • You: Go to bed.

  • Wife: Eat at someone else's place.

  • Wife: Go to bed.
badbadboy
By "execute it", do you mean have two threads (you, wife) execute it? The explicit list of shared resources is good.
S.Lott
@S.Lott: yep, fixed :) thanx
badbadboy
Wait a minute! Did she go to bed at someone else's place, or just eat there?
WW
A: 

Peter wants to pull out of his driveway. He checks that nothing is in the way of his car, then gets in. His son Frank then hides behind the car. Peter cannot see him and runs him over.

The important thing here is that for a computer, "inspect" and "modify" tend to be two separate actions, so an example where you can't check something when you modify it is a good one.

Artelius
A: 

i would use a shared memory bank account example of a data race condition.

explain that the computer does something like: load balance; add 1; store balance;. consider two threads that are modifying your bank account balance (you and your wife are both depositing one dollar at the same time).

if both threads get interuupted after the: load balance; and then resume, you can lose one dollar.

see: http://wasp.cs.washington.edu/atomeclipse/handouts.pdf

Ray Tayek
+1  A: 

One difficulty in explaining the general concept is that race conditions manifest themselves in a wide variety of situations. If your goal is give your non-technical audience the sense that this is a generic problem type, you should try to offer more than one example.

mseery
A: 

As you mentioned, you often need to introduce other concepts (mutual exclusion, threads of execution) to accurately describe race conditions, even in a metaphor. So try defining these terms (or at least getting the idea across) first, using metaphor.

As a simple example, let's use a 4-way intersection (set in a country where you drive on the right). Divide the intersection into 4 quadrants: North-West, North-East, South-East, and South-West. Now call each quadrant a resource, and call each car a thread of execution. These cars only respect traffic systems, and since there are no stop signs or traffic lights at this intersection, the cars barrel right on through without slowing or considering traffic.

You can easily show that simultaneous use of one of these quadrants by more than one car is bad, and results in a car crash. One obvious solution is to install a traffic system. The system ensures that no more than one car is passing through a quadrant at the same time. It can do this intricately, without tying up all the resources. For example, letting cars coming from the South make a left turn to head West (using south-east and north-west quadrants), while letting cars coming from the West make a right turn to head South (using the south-west quadrant). The traffic system is providing mutual exclusion, or preventing simultaneous use (by multiple cars) of a common resource (the quadrant of road in the intersection).

This at least provides the ideas behind these definitions, the idea that simultaneously accessing shared resources can be bad, and that mutual exclusion can solve this problem. After this is established, you'll need to map these to a more appropriate metaphor to show what a race condition is and how it's one of those bad things that results from lack of mutual exclusion for a common resource.

It takes a bit longer, but it grants some familiarity with terms and the big picture before drilling down into a more complex metaphor.

InverseFalcon
A: 

Send them to Race Condition on Wikipedia.

The first part will make some sense, and the rest (not shown below) will make you look smart since they will assume you understand it.

"A race condition or race hazard is a flaw in a system or process whereby the output and/or result of the process is unexpectedly and critically dependent on the sequence or timing of other events. The term originates with the idea of two signals racing each other to influence the output first."

I think the key point to get across is that its most frequently a timing issue that can be unpredictable because the timing something takes differs from time to time.

Simon_Weaver
A: 

A picture is worth a 1000 words. Its true. If you draw a timeline and put some entity on it, and show its state changes as time progresses you can demonstrate a race-condition pretty easily in one diagram. It may take a few redos to get the picture just right, but I've always found that drawing it out gets my point across must faster than describing it.

dviljoen