views:

168

answers:

2

I'm looking for real world examples of needing read and write access to the same value in concurrent systems.

In my opinion, many semaphores or locks are present because there's no known alternative (to the implementer,) but do you know of any patterns where mutexes seem to be a requirement?

In a way I'm asking for candidates for the standard set of HARD problems for concurrent software in the real world.

+2  A: 

Most real world, concurrent software, has some form of requirement for synchronization at some level. Often, better written software will take great pains to reduce the amount of locking required, but it is still required at some point.

For example, I often do simulations where we have some form of aggregation operation occurring. Typically, there are ways to prevent locking during the simulation phase itself (ie: use of thread local state data, etc), but the actual aggregation portion typically requires some form of lock at the end.

Luckily, this becomes a lock per thread, not per unit of work. In my case, this is significant, since I'm typically doing operations on hundreds of thousands or millions of units of work, but most of the time, it's occuring on systems with 4-16 PEs, which means I'm usually restricting to a similar number of units of execution. By using this type of mechanism, you're still locking, but you're locking between tens of elements instead of potentially millions.

Reed Copsey
That sounds reasonable, can I ask why you can't have multiple queues going into your aggregators to take out the last of these locks?
Richard Fabian
@Richard: You potentially could - but the overhead of pumping data into a shared, lockless queue, then processing the queue serially afterwards is higher than the reduced locking required to synchronize.
Reed Copsey
@Richard: Locking isn't necessarily bad, in and of itself, if it's used appropriately. There are times when locking is a better option than the alternatives, but only by profiling can you be sure.
Reed Copsey
@Reed. Current lockless thread safe queue implementations (for arguments sake, the one written in Java) Offers less overhead then acquiring and releasing a lock. The context switching itself is pretty substantial overhead
John V.
@John: This really depends. The issue isn't the queue implementation, but rather the fact that you're adding an extra wait state after you put everything into the queue before you can start processing the queue. This is sometimes beneficial, but sometimes not - it really depends on a lot of other issues.
Reed Copsey
@Reed Can you please elaborate more on the wait state? I may be looking at the problem differently then you are and am not following.
John V.
@John: If you're doing an aggregation, you can either lock and do it directly, or queue the elements, wait for every UE to finish, and then process the queue to do the final work. There's an implicit extra step happening after the concurrent process there. It's not really an extra wait - but typically, as UEs finish, taking the lock doesn't block other threads, since they don't finish at the same time. A lot of this depends on the algorithm in question, but locking, in many cases, can be faster than doing the processing after the fact, provided your locking once per UE, and not once per task.
Reed Copsey
@Reed I see what your argument is. I was interpreting your response as 'Blocking queues offer less overhead then thier non blocking counter part' Sorry for the confusion.
John V.
+4  A: 

What kind of locks are used depends on how the data is being accessed by multiple threads. If you can fine tune the use case, you can sometimes eliminate the need for exclusive locks completely.

An exclusive lock is needed only if your use case requires that the shared data must be 100% exact all the time. This is the default that most developers start with because that's how we think about data normally.

However, if what you are using the data for can tolerate some "looseness", there are several techniques to share data between threads without the use of exclusive locks on every access.

For example, if you have a linked list of data and if your use of that linked list would not be upset by seeing the same node multiple times in a list traversal and would not be upset if it did not see an insert immediately after the insert (or similar artifacts), you can perform list inserts and deletes using atomic pointer exchange without the need for a full-stop mutex lock around the insert or delete operation.

Another example: if you have an array or list object that is mostly read from by threads and only occasionally updated by a master thread, you could implement lock-free updates by maintaining two copies of the list: one that is "live" that other threads can read from and another that is "offline" that you can write to in the privacy of your own thread. To perform an update, you copy the contents of the "live" list into the "offline" list, perform the update to the offline list, and then swap the offline list pointer into the live list pointer using an atomic pointer exchange. You will then need some mechanism to let the readers "drain" from the now offline list. In a garbage collected system, you can just release the reference to the offline list - when the last consumer is finished with it, it will be GC'd. In a non-GC system, you could use reference counting to keep track of how many readers are still using the list. For this example, having only one thread designated as the list updater would be ideal. If multiple updaters are needed, you will need to put a lock around the update operation, but only to serialize updaters - no lock and no performance impact on readers of the list.

All the lock-free resource sharing techniques I'm aware of require the use of atomic swaps (aka InterlockedExchange). This usually translates into a specific instruction in the CPU and/or a hardware bus lock (lock prefix on a read or write opcode in x86 assembler) for a very brief period of time. On multiproc systems, atomic swaps may force a cache invalidation on the other processors (this was the case on dual proc Pentium II) but I don't think this is as much of a problem on current multicore chips. Even with these performance caveats, lock-free runs much faster than taking a full-stop kernel event object. Just making a call into a kernel API function takes several hundred clock cycles (to switch to kernel mode).

Examples of real-world scenarios:

  1. producer/consumer workflows. Web service receives http requests for data, places the request into an internal queue, worker thread pulls the work item from the queue and performs the work. The queue is read/write and has to be thread safe.
  2. Data shared between threads with change of ownership. Thread 1 allocates an object, tosses it to thread 2 for processing, and never wants to see it again. Thread 2 is responsible for disposing the object. The memory management system (malloc/free) must be thread safe.
  3. File system. This is almost always an OS service and already fully thread safe, but it's worth including in the list.
  4. Reference counting. Releases the resource when the number of references drops to zero. The increment/decrement/test operations must be thread safe. These can usually be implemented using atomic primitives instead of full-stop kernal mutex locks.
dthorpe