views:

252

answers:

6

After posting my solution to my own problem regarding memory issues, nusi suggested that my solution lacks locking.

The following pseudo code vaguely represents my solution in a very simple way.

std::map<int, MyType1> myMap;

void firstFunctionRunFromThread1()
{
    MyType1 mt1;
    mt1.Test = "Test 1";
    myMap[0] = mt1;
}

void onlyFunctionRunFromThread2()
{
    MyType1 &mt1 = myMap[0];
    std::cout << mt1.Test << endl; // Prints "Test 1"
    mt1.Test = "Test 2";
}

void secondFunctionFromThread1()
{
    MyType1 mt1 = myMap[0];
    std::cout << mt1.Test << endl; // Prints "Test 2"
}

I'm not sure at all how to go about implementing locking, and I'm not even sure why I should do it (note the actual solution is much more complex). Could someone please explain how and why I should implement locking in this scenario?

+1  A: 

The whole idea is to prevent the program from going into an indeterminate/unsafe state due to multiple threads accessing the same resource(s) and/or updating/modifying the resource so that the subsequent state becomes undefined. Read up on Mutexes and Locking (with examples).

dirkgently
+2  A: 

One function (i.e. thread) modifies the map, two read it. Therefore a read could be interrupted by a write or vice versa, in both cases the map will probably be corrupted. You need locks.

anon
+1 but you should probably provide a way of how
TStamper
To do that I'd need mre info than the question provides.
anon
I meant an example
TStamper
A realistic one would be 100's of lines of code - I do haved other things on my plate!
anon
+1  A: 

In general, threads might be running on different CPUs/cores, with different memory caches. They might be running on the same core, with one interrupting ("pre-empting" the other). This has two consequences:

1) You have no way of knowing whether one thread will be interrupted by another in the middle of doing something. So in your example, there's no way to be sure that thread1 won't try to read the string value before thread2 has written it, or even that when thread1 reads it, it is in a "consistent state". If it is not in a consistent state, then using it might do anything.

2) When you write to memory in one thread, there is no telling if or when code running in another thread will see that change. The change might sit in the cache of the writer thread and not get flushed to main memory. It might get flushed to main memory but not make it into the cache of the reader thread. Part of the change might make it through, and part of it not.

In general, without locks (or other synchronization mechanisms such as semaphores) you have no way of saying whether something that happens in thread A will occur "before" or "after" something that happens in thread B. You also have no way of saying whether or when changes made in thread A will be "visible" in thread B.

Correct use of locking ensures that all changes are flushed through the caches, so that code sees memory in the state you think it should see. It also allows you to control whether particular bits of code can run simultaneously and/or interrupt each other.

In this case, looking at your code above, the minimum locking you need is to have a synchronisation primitive which is released/posted by the second thread (the writer) after it has written the string, and acquired/waited on by the first thread (the reader) before using that string. This would then guarantee that the first thread sees any changes made by the second thread.

That's assuming the second thread isn't started until after firstFunctionRunFromThread1 has been called. If that might not be the case, then you need the same deal with thread1 writing and thread2 reading.

The simplest way to actually do this is to have a mutex which "protects" your data. You decide what data you're protecting, and any code which reads or writes the data must be holding the mutex while it does so. So first you lock, then read and/or write the data, then unlock. This ensures consistent state, but on its own it does not ensure that thread2 will get a chance to do anything at all in between thread1's two different functions.

Any kind of message-passing mechanism will also include the necessary memory barriers, so if you send a message from the writer thread to the reader thread, meaning "I've finished writing, you can read now", then that will be true.

There can be more efficient ways of doing certain things, if those prove too slow.

Steve Jessop
+2  A: 

Actually, it's not even just locking that is the issue...

If you really want thread two to ALWAYS print "Test 1", then you need a condition variable.

The reason is that there is a race condition. Regardless of whether or not you create thread 1 before thread 2, it is possible that thread 2's code can execute before thread 1, and so the map will not be initialized properly. To ensure that no one reads from the map until it has been initialized you need to use a condition variable that thread 1 modifies.

You also should use a lock with the map, as others have mentioned, because you want threads to access the map as though they are the only ones using it, and the map needs to be in a consistent state.

Here is a conceptual example to help you think about it:

Suppose you have a linked list that 2 threads are accessing. In thread 1, you ask to remove the first element from the list (at the head of the list), In thread 2, you try to read the second element of the list.

Suppose that the delete method is implemented in the following way: make a temporary ptr to point at the second element in the list, make the head point at null, then make the head the temporary ptr...

What if the following sequence of events occur: -T1 removes the heads next ptr to the second element - T2 tries to read the second element, BUT there is no second element because the head's next ptr was modified -T1 completes removing the head and sets the 2nd element as the head

The read by T2 failed because T1 didn't use a lock to make the delete from the linked list atomic!

That is a contrived example, and isn't necessarily how you would even implement the delete operation; however, it shows why locking is necessary: it is necessary so that operations performed on data are atomic. You do not want other threads using something that is in an inconsistent state.

Hope this helps.

Tom
"you need to use a condition variable". Or, given that this is a simple situation, two semaphores would do.
Steve Jessop
A: 

The set of instructions created as a result of compiling your code can be interleaved in any order. This can yield unpredictable and undesired results. For example, if thread1 runs before thread2 is selected to run, your output may look like:

Test 1

Test 1

Worse yet, one thread may get pre-empted in the middle of assigning - if assignment is not an atomic operation. In this case let's think of atomic as the smallest unit of work which can not be further split.

In order to create a logically atomic set of instructions -- even if they yield multiple machine code instructions in reality -- is to use a lock or mutex. Mutex stands for "mutual exclusion" because that's exactly what it does. It ensures exclusive access to certain objects or critical sections of code.

One of the major challenges in dealing with multiprogramming is identifying critical sections. In this case, you have two critical sections: where you assign to myMap, and where you change myMap[ 0 ]. Since you don't want to read myMap before writing to it, that is also a critical section.

FreeMemory
A: 

The simplest answer is: you have to lock whenever there is an access to some shared resources, which are not atomics. In your case myMap is shared resource, so you have to lock all reading and writing operations on it.

oo_olo_oo