views:

236

answers:

4

In theory, it should be possible to at least brute force a verification of a lock-free algorithm (there are only so many combinations of function calls intersecting). Are there any tools or formal reasoning processes available to actually prove that a lock-free algorithm is correct (ideally it should also be able to check for race conditions and the ABA problem as well)?

Note: If you know a way to just prove one point (e.g. only prove that it is safe from the ABA problem) or a problem I haven't mentioned then post the solution anyway. In the worst case scenario, each method can be done in turn to fully verify it.

+3  A: 

I don't know what platform or language you're using - but on the .Net platform there is a Microsoft Research project called Chess which is looking very promising at helping those of us doing multithreaded components - including lock-free.

I've not used it a huge amount, mind.

It works (crude explanation) by explicitly interleaving threads in the tightest possible ways to actually force your bugs out into the wild. It also analyses code to find common mistakes and bad patterns - similar to code analysis.

In the past, I've also built special versions of the code in question (through #if blocks etc) that add extra state-tracking information; counts, versions etc that I can then dip into within a unit test.

The problem with that is that it takes a lot more time to write your code, and you can't always add this kind of stuff without changing the underlying structure of the code that's already there.

Andras Zoltan
+1, very interesting, especially how non-intrusive it is. But this comment: http://channel9.msdn.com/shows/Going+Deep/CHESS-An-Automated-Concurrency-Testing-Tool/?CommentID=448235 suggests that it doesn't (yet) consider pre-emptions between arbitrary instructions. So the fewer locks/mutexes etc. you have in place, the worse Chess will do in finding bugs -- and if your multithreaded coded contains no locks at all Chess won't find anything.
j_random_hacker
yes it is an ambitious project with some way to go - I love the current latest version number: v0.1.30626.0.They clearly have quite a way to go before a v1.0!
Andras Zoltan
I found this page: http://www.projectbentley.com/work/chess/lockfreequeue.php#test1 by somebody who's been playing CHESS (sorry, couldn't resist!) to try and detect lock-free errors. He discovered that marking things volatile caused it to pick up lock-free errors with the test written correctly. Doesn't mean it would work for all lock-free stuff, though.But there we see it - in this case you'd probably have to change the declaration of some of your variables to 'volatile' with an #if...#endif for a 'TESTING' build.
Andras Zoltan
Good that they implemented the `volatile` "marker" for .NET programs. Although I don't use .NET, I think it would be nice if there was a way to convince the compiler that all your variables were volatile without writing it everywhere.
j_random_hacker
I agree - volatility in .Net is frowned upon by the framework designers. Our interlocked operations, which require a 'ref' to an object, throw compiler warnings because a 'ref' to a volatile value is considered dodgy.I wish there was a lower level API to interlocked read/write operations and memory barriers in .Net (don't even get me started on SIMD!) - I'm not convinced that our Interlocked operations get turned into the associated CPU instructions, I think they utilise Windows' API for it.
Andras Zoltan
+1  A: 

Data race detection is an NP hard problem [Netzer&Miller 1990]

I heard about the tools Lockset, and DJit+ (they teach it in the CDP course). Try reading the slides, and googling what they reference to. It contains some interesting information.

Anna
Looks interesting, but from skimming the Powerpoint it seems that neither tool can make any guarantees, since they only examine whatever instruction sequences the scheduler gives them, not all possible sequences as Spin and Chess do. Also, got any links for the tools?
j_random_hacker
In Uni they only teach you the theory, haha :). I did find the original lockset paper, though: http://www.cs.washington.edu/homes/tom/pubs/eraser.pdf . I don't remember playing around with any of these tools in the course, though... hmm... They might be only uni-level implementations for paper purposes. In this case, I think that maybe I should remove my comment altogether.
Anna
I realized this would be an NP problem, thats why I made the comment about brute force. Its still possible to do thorough testing, although the more complex the problem (more operations/input combinations), the longer it will take to test all possible sequences.
Grant Peters
@Anna: Don't remove this useful answer! :)
j_random_hacker
+7  A: 

You should definitely try the Spin model checker.

You write a program-like model in a simple C-like language called Promela, which Spin internally translates into a state machine. A model can contain multiple parallel processes.

What Spin then does is check every possible interleaving of instructions from each process for whatever conditions you want to test -- typically, absence of race conditions, freedom from deadlocks etc. Most of these tests can be easily written using assert() statements. If there is any possible execution sequence that violates an assertion, the sequence is printed out, otherwise you are given the "all-clear".

(Well, in actual fact it uses a much fancier and faster algorithm to accomplish this, but that is the effect. By default, all reachable program states are checked.)

This is an incredible program, it won the 2001 ACM System Software Award (other winners include Unix, Postscript, Apache, TeX). I got started using it very quickly, and in a couple of days was able to implement models of the MPI functions MPI_Isend() and MPI_Irecv() in Promela. Spin found a couple of tricky race conditions in one segment of parallel code I converted across to Promela for testing.

j_random_hacker
Spin works well. Note, however, that although the learning curve is much less steep than using, say, PVS or Isabelle (theorem provers), it's still pretty tricky. To really use SPIN you often need to reduce the state space; and that means adding assumptions that limit verification, and you also need to know what to look for in the first place. Still, spin's a very solid and relatively easy tool, as long as you don't expect any magic.
Eamon Nerbonne
+1 very cool. That's going in my downloads folder :)
Andras Zoltan
Oh, and don't forget memory barriers; spin (AFAIK) assumes all writes are atomic and instantly visible, you'll need to reason about memory barriers separately (which is generally necessary for low-or-no-lock techniques)
Eamon Nerbonne
@Eamon: Absolutely, there is a learning curve and I found I needed to seriously abstract/downsize some things to come up with feasible models. Re memory barriers: you might be able to use channels to model this -- instead of a global var, have a process to which you "send" read and write requests via a channel and "receive" all reads via another channel.
j_random_hacker
This looks exactly like what I've been looking for. Been doing lock-free programing for almost a year now, just haven't had a decent way to prove my algorithms work (though we haven't had any problems with the ones I've been confident to place into production code).
Grant Peters
@Grant: Another thing to bear in mind is that Spin can only prove correctness for finite problem sizes (e.g. 1 producer, 3 consumers). But OTOH, it's sometimes possible to prove mathematically that e.g. "if it works for 3 it will work for any number". The original Spin paper mentions a case where reasoning like this, plus Spin, enabled proof of correctness of a network protocol.
j_random_hacker
@ j_random_hacker: Yeah, I assumed as much (NP problem and all), luckily I currently work on game consoles only, so I know the maximum number of physical cores (i.e. maximum number of consumers/produces == number of cores, we never have more than 1 thread per core except on single core systems in which case there are 2).
Grant Peters
+1  A: 

Spin is indeed excellent, but also consider Relacy Race Detector by Dmitriy V'jukov. It is purpose-built for verifying concurrent algorithms including non-blocking (wait-/lock-free) algorithms. It's open source and liberally licensed.

Relacy provides POSIX and Windows synchronization primitives (mutexes, condition variables, semaphores, CriticalSections, win32 events, Interlocked*, etc), so your actual C++ implementation can be fed to Relacy for verification. No need to develop a separate model of your algorithm as with Promela and Spin.

Relacy provides C++0x std::atomic (explicit memory ordering for the win!) so you can use pre-processor #defines to select between Relacy's implementation and your own platform specific atomic implementation (tbb::atomic, boost::atomic, etc).

Scheduling is controllable: random, context-bound, and full search (all possible interleavings) available.

Here's an example Relacy program. A few things to note:

  • The $ is a Relacy macro that records execution information.
  • rl::var<T> flags "normal" (non-atomic) variables that also need to be considered as part of the verification.

The code:

#include <relacy/relacy_std.hpp>

// template parameter '2' is number of threads
struct race_test : rl::test_suite<race_test, 2>
{
    std::atomic<int> a;
    rl::var<int> x;

    // executed in single thread before main thread function
    void before()
    {
        a($) = 0;
        x($) = 0;
    }

    // main thread function
    void thread(unsigned thread_index)
    {
        if (0 == thread_index)
        {
            x($) = 1;
            a($).store(1, rl::memory_order_relaxed);
        }
        else
        {
            if (1 == a($).load(rl::memory_order_relaxed))
                x($) = 2;
        }
    }

    // executed in single thread after main thread function
    void after()
    {
    }

    // executed in single thread after every 'visible' action in main threads
    // disallowed to modify any state
    void invariant()
    {
    }
};

int main()
{
    rl::simulate<race_test>();
}

Compile with your normal compiler (Relacy is header-only) and run the executable:

struct race_test
DATA RACE
iteration: 8

execution history:
[0] 0:  atomic store, value=0, (prev value=0), order=seq_cst, in race_test::before, test.cpp(14)
[1] 0:  store, value=0, in race_test::before, test.cpp(15)
[2] 0:  store, value=1, in race_test::thread, test.cpp(23)
[3] 0:  atomic store, value=1, (prev value=0), order=relaxed, in race_test::thread, test.cpp(24)
[4] 1:  atomic load, value=1, order=relaxed, in race_test::thread, test.cpp(28)
[5] 1:  store, value=0, in race_test::thread, test.cpp(29)
[6] 1: data race detected, in race_test::thread, test.cpp(29)

thread 0:
[0] 0:  atomic store, value=0, (prev value=0), order=seq_cst, in race_test::before, test.cpp(14)
[1] 0:  store, value=0, in race_test::before, test.cpp(15)
[2] 0:  store, value=1, in race_test::thread, test.cpp(23)
[3] 0:  atomic store, value=1, (prev value=0), order=relaxed, in race_test::thread, test.cpp(24)

thread 1:
[4] 1:  atomic load, value=1, order=relaxed, in race_test::thread, test.cpp(28)
[5] 1:  store, value=0, in race_test::thread, test.cpp(29)
[6] 1: data race detected, in race_test::thread, test.cpp(29)

Recent versions of Relacy also provide Java and CLI memory models if you're into that sort of thing.

sstock
that looks great, I'll have to give it a try as well
Grant Peters