views:

389

answers:

11

I have a rather large, dynamic sparse matrix object class to write, and I want to make the following happen: one thread to handle placing elements into the matrix, and one to handle reading from the matrix.

The only time when these two would conflict would be when they would both want to access the same row/column at the same time. As such, I've decided that a simple mutex lock for each row/column is sufficient.

Now this is the first time I've actually done threading in C/C++, and I'd like to do it by the books, so to speak. I have two concerns.

  1. How do I spawn these threads? This is a language question more than anything.
  2. How do I implement the locking itself as efficiently as possible? I figure if there is a conflict, then the requesting thread would place itself in line and wait until the resource is freed. However, how do I implement that waking? I can have a loop poll the memory location, but that's not elegant. Ideally, I figure an interrupt based approach would be best.
+6  A: 

C++ itself does not offer any threading. On Windows, you can use CreateThread. On UNIX, you can use POSIX threads (pthreads).

There should be no need to implement your own concurrency primitives. On Windows, for example, you can create a mutex object using CreateMutex, and use WaitForSingleObject to wait until it is released.

Thomas
This is true, but suggesting he go directly to the system-specific functions is not likely to yield good results.
Steven Sudit
The Boost answer has way more upvotes anyway. I'll let that fact speak for itself :)
Thomas
@Steven Sudit: *pthreads* are quite portable, what with the *P* standing for the *P* in *POSIX*, which in turn means Portable Operating System Interface. Boost.Thread seemed to suffer from the LCD (Lowest Common Denominator) syndrome when I looked at it (but that obviously depends on one's needs, if it gives you enough you're golden).
just somebody
There's a pretty good pthread implementation for windows (with mingw it's true) in the form of pthreads-win32
laura
@just somebody: While there *are* Windows ports of pthreads, I still consider it system-specific, in that it's associated with Unix; I wouldn't expect non-Unix programmers to even know them. Of course, if portability is defined in terms of supporting many Unix variants, then pthreads are probably the way to go. Boost, on the other hand, is genuinely portable, so if it meets your needs, I'd recommend it.
Steven Sudit
@Steven Sudit: That's probably our backgrounds betraying both of us, mine surely is. I'm have worked mostly with {Free,Open}BSD and various operating systems based on the GNU toolchain and the Linux kernel (mostly Redhat) in various positions (package maintenance, OS-specific utilities, network traffic interception, ...). I have many reasons to consider any two so-called "Linux distros" distinct operating systems. *Unix* is not a single OS, the meaning has long ago widened/diluted to cover majority (wild guess) of general-purpose OSs used today. POSIX (subsets) have even wider reach.
just somebody
@just somebody: Well, hopefully we can cancel out each other's biases so that Alex can get a useful answer out of this.
Steven Sudit
For example: with Apple's move to "unix" with OS X, is *pthreads* portability wider because it's now applicable to the OS used in Mac computers, or is it no more portable than before because there's no MacOS anymore, just another "unix"?
just somebody
@just somebody: While, as you guessed, my experience is heavy on the Windows side, when I've written cross-platform code that was supposed to run on "Unix", I quickly found that there are only Unix *variants*, each with their own peculiarities. Apple's OS X is a stranger variant than some, but I'd say that the move made pthreads no more or less portable than before.
Steven Sudit
+1  A: 

I can answer part one fairly simply - it depends on your platform. If you're using Win32 API, take a look at http://msdn.microsoft.com/en-us/library/ms682453%28VS.85%29.aspx "CreateThread" function and read up on examples. The book I read on multi-threading on Windows was this one: http://www.amazon.co.uk/Windows-PRO-Developer-Jeffrey-Wintellect-Christophe/dp/0735624240 which covers not just threading with CreateThread and the other option BeginThread but also locks and semaphones etc.

If you're using Linux, you'll want POSIX Threads via the pthread function, see http://www.yolinux.com/TUTORIALS/LinuxTutorialPosixThreads.html as an example.

A code example for pthreads looks like this - be careful, I've left in functionality for creating several threads from the same function i.e. callocing an array of pthread_t variables. You might not need this.

#include <pthread.h>
#include <stdio.h>
#include <unistd.h>
#include <malloc.h>

void *thread_function(void *arg) 
{
        /* DO SOME STUFF */

        /* Exit Thread */
    pthread_exit();
}

int main(int argc, char** argv)
{
        /* variables */
        int retval = 0;
        /* array of thread handles */
    pthread_t* thread_handle = (pthread_t*) calloc(1, sizeof(pthread_t));;

        /* create function - fork() for threads */
    retval = pthread_create(&thread_handle[0], NULL, thread_function, NULL);

        /* DO SOME STUFF */

        /* join - wait for thread to finish */ 
        pthread_join(thread_handle[0], NULL); 

    return EXIT_SUCCESS;
}

Compile with gcc filename -o fileexe -lpthread

Ninefingers
+1  A: 

are you sure you want a matrix, what you describe sounds like a queue. The locking for a queue is fairly simple to do: when anybody reads or writes they take an exclusive lock (via a pthread_mutex). (Dont get into readwrite locks etc unless you really know you are in perf trouble)

certainly no Interrupts are needed

pm100
It's a sparse matrix (http://en.wikipedia.org/wiki/Sparse_matrix). it works like a 2-D linked list for performance reasons.
Mike DeSimone
+18  A: 

If this is your first time to do multi-threading, use the Boost.Threads library. Its semantics (including synchronization mechanisms) are very straightforward and your implementation will be portable.

http://www.boost.org/doc/libs/1_42_0/doc/html/thread.html

Ben Collins
+1  A: 

Start with boost threads.

And as for your design, it sounds like you've decided to allow random access to the whole matrix from any thread.

If at all possible, it would be better to figure out how to partition responsibility for parts of the matrix to specific threads. Taking out a lock for every cell access is going to be quite a bottleneck.

Daniel Earwicker
Well, it's a linked list implementation, so that's not possible.
Alex
It's the nature of the problem that determines what solution you pick... unless this is homework, or something like that?
Daniel Earwicker
+3  A: 

First of all, you don't need a mutex per column and one per row. If you acquired a mutex for a row, you locked all cells in that row, so it doesn't matter which column you access. Or if you acquire a lock per column, you locked all cells in that columns, doesn't matter which row. So you can have a mutex per table, one per cell, one per row or one per column. But one per row and one per column makes no sense.

Most synchronization primitives will block your threads and the thread will simply resume when the resource becomes free, you do not need to worry about signaling and waking up. That part is exactly what a synchronization object like a mutex or a critical section does for you.

The specifics of how to construct and use the synchronization primitives are platform specific. As others have posted, there are cross platform libraries you can use, but you must specify what platform you target at least so we know what libraries are available.

Remus Rusanu
When you say "you don't need a mutex per column and one per row", are you sure that applies to a sparse matrix? I see your point for normal matrices.
Mike DeSimone
This is a perpendicularly linked list implementation of a sparse matrix. What would happen is the adder would acquire a lock on the row and column where the desired element is located. Then, during the course of the addition, the worst that could happen would be for the reader to cross the row or column that is locked, but not both. It is locked out from reading the entire column. Should the adder modify anything, then the modification is going to lie perpendicular to the search direction of the reader.
Alex
I get the impression that a deadlock might be easy to encounter here. Imagine if one thread grabs the row and tries to acquire the column while another thread grabs the column and tries to acquire the row. Perhaps a lock acquisition order should be maintained.
Steven Sudit
@Steven: locks per row *and* per column could lead to deadlock, if order is not maintainded. My point is though that there is no need for locks per row *and* per column. If one need to write to cell A2, it will lock row 'A' and column '2'. If one needs to lock cell A3, will lock row A.. ans stop since the *entire* row is locked already. So what is the point of the per column lock?
Remus Rusanu
@Mike: I mean 'locks' in logical terms ie. I lock the row *named* 'A', and what that actually means depends on the implementation (a mutex associated with the row, or with the list head etc), so I don't think sparse/dense makes a difference. But if the access protocol is 'lock the row, then lock the column, then access cell' then the protocol doesn't need both locks, regardless of the implementation.
Remus Rusanu
@Remus: @Alex is (attempting) to create a virtual array[N*M] of mutexes by mapping it onto an array[N+M] (of something) and allowing aliasing errors (of some kind). On the face of it, shouldn't be impossible, but also row+column addressing works with AND operations, not OR.
Potatoswatter
`mutex_lock`ing on both row and column will block if one OR the other is locked. But another operation would allow you to query the row and synthesize an OR operation. For example, atomic-increment the row. If the row counter was 0, atomic increment the column. If the row counter was >0, block until the column counter is 0, then set the column to 1. This should be doable with semaphores.
Potatoswatter
@Remus: please see my answer below…
Potatoswatter
@Remus: OK, I see your point now. The only reason you would need separate locks for rows or columns is if you had cases where you want to lock a row *or* a column, but not both, such as with matrix multiplication.
Mike DeSimone
@Remus Rusanu: Good point, but I had assumed that there are use cases where you really do need to lock an entire row or an entire column. I think Potatoswatter's answer here is interesting.
Steven Sudit
+1  A: 

Thomas covered the threading library. The only reason you'd want to mess with interrupt handlers is if you don't have an OS. Otherwise, use what the OS gives you for thread control.

The thing I'd warn you about with the locking is to be careful not to deadlock. You want to lock by rows and columns, so each row and each column needs its own mutex. (Actually, a reader/writer lock would be far better for performance, but you have to be even more careful about deadlocks and race conditions.)

Make sure you always acquire and release the locks in a consistent order; you don't want a thread to lock row N and then block locking column K, and then the thread which has locked column K decides to lock row N and blocks, giving you two blocked threads and nothing happening (cue John Woo pistol standoff).

Mike DeSimone
I had the same thought myself, although you beat me to it.
Steven Sudit
A: 

If you have a mutex for each row/column, you'll have a race condition when a row-writing thread and a column-reading thread accesses the element at the row/column intersection. If you have a large matrix, you'll need many mutexes. I'm not sure every operating system can provide thousands of mutexes.

It sounds like a thread-safe producer-consumer queue would be a better solution to your problem. Check out Intel's Thread Building Blocks (TBB) library. I find that multi-threading becomes much easier when you model your program using the data flow paradigm.

I you want to reduce the memory footprint of large, sparce matrices, check out Boost.uBlas. It has a sparce_matrix template class that uses a map to store elements associatively by indices.

For efficiency reasons, you will not want to pass entire matrices by copy to your producer-consumer queue. Instead, you'll want to pass around proxies of your sparce matrices. If TBB doesn't already have some kind of proxy facility, you can simply use boost::shared_ptr. You may also want to have a pre-allocated pool of matrices that are recycled in a data-flow "circuit".

Emile Cormier
+1  A: 

Quick idea, choose one of available implementations of std::threads and then consider std::async and std::future and related tools.

mloskot
+1  A: 

As for your second question, on how to do the locking: putting the thread to sleep and waking it up will be done by the OS, that's not your worry. But there is a problem with your scheme.

You want to block access to a cell only if its row AND its column are locked. Which is to say, allow access if the row OR the column are UNlocked. That is not typically the way locks work. Furthermore, if the row was locked but you allowed access anyway (because the column was unlocked), you still want to lock it "more." This means you'll need more than a mutex.

The best implementation I can think of uses one atomic counter for the rows and a condition variable counter for the columns. Upon access:

  • Increment the row's atomic counter.
  • If the previous value of the atomic counter was zero, the row was unlocked. Access is OK.
    • Increment the column's condition variable. Ideally this should not block.
  • If the row's counter was nonzero, it was locked. Maybe block on the column.
    • Wait until the column's condition variable is zero. Set it to one before releasing the mutex.
  • When you're done with the access:
  • Decrement the condition variable (don't forget to lock it), and signal it to wake other accesses if it became zero.
  • Atomically decrement the row counter.

This involves a little fancy footwork, but the total number of locking resources is overall pretty low. Can anyone else do better (or find a counterexample to my scheme)?

Also note, the partitioning of the matrix into rows and columns is somewhat arbitrary. If too much contention occurs under this scheme, you should probably subdivide the rows into halves (for example).

Potatoswatter
There's a problem with this: if three threads attempt to access the same column and two block, the second will always pass the write lock directly to the third despite the critical section. Thread 2 can't release the mutex and thread 3 can't release the writelock. Mutex acquisition can't be moved before the writelock, either. We want rwlocks where writes aren't prioritized.
Potatoswatter
… so `trylock` the mutex. If it fails, release the write lock, (blocking) acquire and release the mutex (to ensure the other thread gets the read lock), loop back to acquiring the write lock. There goes (some) performance :v( . Smells like a proper implementation would need a signal for each column.
Potatoswatter
… fixed. Interesting problem…
Potatoswatter
A: 

Rather than having separate thread for read and write ( which will required locking always) you can restrict thread to access only particular elements of matrix , say lone thread for half of rows and other thread for last half (or one thread for even row and one for odd row ) this way you can ensure no thread is ever blocked

Abhi