views:

340

answers:

2

I'd like to minimize synchronization and write lock-free code when possible in a project of mine. When absolutely necessary I'd love to substitute light-weight spinlocks built from atomic operations for pthread and win32 mutex locks. My understanding is that these are system calls underneath and could cause a context switch (which may be unnecessary for very quick critical sections where simply spinning a few times would be preferable).

The atomic operations I'm referring to are well documented here: http://gcc.gnu.org/onlinedocs/gcc-4.4.1/gcc/Atomic-Builtins.html

Here is an example to illustrate what I'm talking about. Imagine a RB-tree with multiple readers and writers possible. RBTree::exists() is read-only and thread safe, RBTree::insert() would require exclusive access by a single writer (and no readers) to be safe. Some code:

class IntSetTest
{
private:
    unsigned short lock;
    RBTree<int>* myset;

public:
    // ...

    void add_number(int n)
    {
        // Aquire once locked==false (atomic)
        while (__sync_bool_compare_and_swap(&lock, 0, 0xffff) == false);

        // Perform a thread-unsafe operation on the set
        myset->insert(n);

        // Unlock (atomic)
        __sync_bool_compare_and_swap(&lock, 0xffff, 0);
    }

    bool check_number(int n)
    {
        // Increment once the lock is below 0xffff
        u16 savedlock = lock;
        while (savedlock == 0xffff || __sync_bool_compare_and_swap(&lock, savedlock, savedlock+1) == false)
            savedlock = lock;

        // Perform read-only operation    
        bool exists = tree->exists(n);

        // Decrement
        savedlock = lock;
        while (__sync_bool_compare_and_swap(&lock, savedlock, savedlock-1) == false)
            savedlock = lock;

        return exists;
    }
};

(lets assume it need not be exception-safe)

Is this code indeed thread-safe? Are there any pros/cons to this idea? Any advice? Is the use of spinlocks like this a bad idea if the threads are not truly concurrent?

Thanks in advance. ;)

+1  A: 

It may be worth noting that if you're using the Win32 mutexes, that from Vista onwards a thread pool is provided for you. Depending on what you use the RB tree for, you could replace with that.

Also, what you should remember is that atomic operations are not particularly fast. Microsoft said they were a couple hundred cycles, each.

Rather than trying to "protect" the function in this way, it would likely be much more efficient to simply synchronize the threads, either changing to a SIMD/thread pool approach, or to just use a mutex.

But, of course, without seeing your code, I can't really make any more comments. The trouble with multithreading is that you have to see someone's whole model to understand it.

DeadMG
Well another important point is the whole "lightweight" aspect of this. This is just an example, but in my actual code there could in some cases be millions of these objects and I don't think it'd be practical to create millions of pthread or win32 mutexes. An unsigned 16bit int would actually not cause any additional overhead (because of alignment).
Thomas
Actually the thread pool (http://msdn.microsoft.com/en-us/library/ms684957(VS.85).aspx) has been available since Windows 2000.
Billy ONeal
It's not practical to use millions of interlocked operations either.I'm still thinking that you need to redesign your threading model. You seem to want to design a class that's high-performance and totally threading ignorant.@Billy ONeal - you're right. I hadn't noticed that function before.
DeadMG
Well. I do need to design classes which are both high performance and thread-safe in general. Its because I have no idea what they will be used for. It's all up to the user. No doubt if I had one specific use, there would be far more efficient ways of designing this.
Thomas
Ah - that makes sense and I hadn't considered that. You can't write a threading-model appropriate tree, because you don't know the threading model.In that case, I don't really have much to help you, except that you could consider making it the user's problem.
DeadMG
+1  A: 

You need a volatile qualifier on lock, and I would also make it a sig_atomic_t. Without the volatile qualifier, this code:

    u16 savedlock = lock;
    while (savedlock == 0xffff || __sync_bool_compare_and_swap(&lock, savedlock, savedlock+1) == false)
        savedlock = lock;

may not re-read lock when updating savedlock in the body of the while-loop. Consider the case that lock is 0xffff. Then, savedlock will be 0xffff prior to checking the loop condition, so the while condition will short-circuit prior to calling __sync_bool_compare_and_swap. Since __sync_bool_compare_and_swap wasn't called, the compiler doesn't encounter a memory barrier, so it might reasonably assume that the value of lock hasn't changed underneath you, and avoid re-loading it in savedlock.

Re: sig_atomic_t, there's a decent discussion here. The same considerations that apply to signal handlers would also apply to threads.

With these changes, I'd guess that your code would be thread-safe. I would still recommend using mutexes, though, since you really don't know how long your RB-tree insert will take in the general case (per my previous comments under the question).

Aidan Cully
This is interesting. I have read many articles explaining why volatile is a multi-threaded program's best friend, and many explaining why volatile has nothing to do with this and making everything volatile will simply slow the program down. In my application, more than half of the data could be accessed by any thread and any time. Should they really all be volatile? Or is this the exception because it's in a tight loop that the compiler might optimize to only check the lock once?
Thomas
i.e. Picture a function (which isn't inlined) is called, checks a variable, then returns, and is called quickly again. In this case, would volatile not be necessary because the compiler would not be able to optimize code across multiple calls? But in the loop above it might realize that lock could never change and optimize it out? So volatile has nothing to do with caching, it simply tells the compiler not to optimize access to the memory? I think this just made sense to me. Please confirm or clarify! :)
Thomas
I spent some time looking up how volatile works... In a nutshell, what it does is to prevent optimizing memory accesses, and also prevent re-ordering of memory operations involving volatile variables. (Memory operations involving non-volatile-qualified variables may be re-ordered around those involving volatile. Further, even if the writes occur in order, a different CPU may notice the new values in a different order.) This should be enough for multi-threaded synchronization _in this case_, because you also have the `__sync` routines providing a memory barrier.
Aidan Cully