views:

158

answers:

6

I have a piece of mature geospatial software that has recently had areas rewritten to take better advantage of the multiple processors available in modern PCs. Specifically, display, GUI, spatial searching, and main processing have all been hived off to seperate threads. The software has a pretty sizeable GUI automation suite for functional regression, and another smaller one for performance regression. While all automated tests are passing, I'm not convinced that they provide nearly enough coverage in terms of finding bugs relating race conditions, deadlocks, and other nasties associated with multi-threading. What techniques would you use to see if such bugs exist? What techniques would you advocate for rooting them out, assuming there are some in there to root out?

What I'm doing so far is running the GUI functional automation on the app running under a debugger, such that I can break out of deadlocks and catch crashes, and plan to make a bounds checker build and repeat the tests against that version. I've also carried out a static analysis of the source via PC-Lint with the hope of locating potential dead locks, but not had any worthwhile results.

The application is C++, MFC, mulitple document/view, with a number of threads per doc. The locking mechanism I'm using is based on an object that includes a pointer to a CMutex, which is locked in the ctor and freed in the dtor. I use local variables of this object to lock various bits of code as required, and my mutex has a time out that fires my a warning if the timeout is reached. I avoid locking where possible, using resource copies where possible instead.

What other tests would you carry out?

Edit: I have cross posted this question on a number of different testing and programming forums, as I'm keen to see how the different mind-sets and schools of thought would approach this issue. So apologies if you see it cross-posted elsewhere. I'll provide a summary links to responses after a week or so

+2  A: 

Not really an answer:

Testing multithreaded bugs is very difficult. Most bugs only show up if two (or more) threads go to specific places in code in a specific order. If and when this condition is met may depend on the timing of the process running. This timing may change due to one of the following pre-conditions:

  • Type of processor
  • Processor speed
  • Number of processors/cores
  • Optimization level
  • Running inside or outside the debugger
  • Operating system

There are for sure more pre-conditions that I forgot.

Because MT-bugs so highly depend on the exact timing of the code running Heisenberg's "Uncertainty principle" comes in here: If you want to test for MT bugs you change the timing by your "measures" which may prevent the bug from occurring...

The timing thing is what makes MT bugs so highly non-deterministic. In other words: You may have a software that runs for months and then crashes some day and after that may run for years. If you don't have some debug logs/core dumps etc. you may never know why it crashes.

So my conclusion is: There is no really good way to Unit-Test for thread-safety. You always have to keep your eyes open when programming.

To make this clear I will give you a (simplified) example from real life (I encountered this when changing my employer and looking on the existing code there):

Imagine you have a class. You want that class to automatically deleted if no-one uses it anymore. So you build a reference-counter into that class: (I know it is a bad style to delete an instance of a class in one of it's methods. This is because of the simplification of the real code which uses a Ref class to handle counted references.)

class A {
  private:
    int refcount;
  public:
    A() : refcount(0) {
    }
    void Ref() {
      refcount++;
    }
    void Release() {
      refcount--;
      if (refcount == 0) {
        delete this;
      }
    }
};

This seams pretty simple and nothing to worry about. But this is not thread-safe! It's because "refcount++" and "refcount--" are not atomic operations but both are three operations:

  • read refcount from memory to register
  • increment/decrement register
  • write refcount from register to memory

Each of those operations can be interrupted and another thread may, at the same time manipulate the same refcount. So if for example two threads want to incremenet refcount the following COULD happen:

  • Thread A: read refcount from memory to register (refcount: 8)
  • Thread A: increment register
    • CONTEXT CHANGE -
  • Thread B: read refcount from memory to register (refcount: 8)
  • Thread B: increment register
  • Thread B: write refcount from register to memory (refcount: 9)
    • CONTEXT CHANGE -
  • Thread A: write refcount from register to memory (refcount: 9)

So the result is: refcount = 9 but it should have been 10!

This can only be solved by using atomic operations (i.e. InterlockedIncrement() & InterlockedDecrement() on Windows).

This bug is simply untestable! The reason is that it is so highly unlikely that there are two threads at the same time trying to modify the refcount of the same instance and that there are context switches in between that code.

But it can happen! (The probability increases if you have a multi-processor or multi-core system because there is no context switch needed to make it happen). It will happen in some days, weeks or months!

rstevens
Even with interlocked functions, this would still not be thread safe. Consider:Thread 1: a->Ref(); // work with a a->Release();Thread 2: a->Release();Assume thread 1 calls Ref() while thread 2 is in the Release() call between the InterlockedDecrement and the delete this...
oefe
The concurrency and instruction reordering that happens in the CPU, makes concurrency bugs even more unpredictable than the example in that post. See http://www.infoq.com/presentations/click-crash-course-modern-hardware
Esko Luontola
@Esko Luontola: That's right. But if my simple example already causes problems then it surely doesn't get better if the compiler or CPU reorders the instructions :-)
rstevens
@oefe: You're right. But that's a weakness of the architecture of that class outside the scope I wanted to look on. The weakness is that it was possible to call Ref(), Release() and a second Release(). So the calls are not symmetric. Normally you have a class, encapsulating the Ref() and Release() calls in constructor/destructor to ensure they are called symmetrically.
rstevens
+3  A: 

Whilst I agree with @rstevens answer in that there's currently no way to unit test threading issues with 100% certainty there are some things that I've found useful.

Firstly whatever tests you have make sure you run them on lots of different spec boxes. I have several build machines, all different, multi-core, single core, fast, slow, etc. The good thing about how diverse they are is that different ones will throw up different threading issues. I've regularly been surprised to add a new build machine to my farm and suddenly have a new threading bug exposed; and I'm talking about a new bug being exposed in code that has run 10000s of times on the other build machines and which shows up 1 in 10 on the new one...

Secondly most of the unit testing that you do on your code needn't involve threading at all. The threading is, generally, orthogonal. So step one is to tease the code apart so that you can test the actual code that does the work without worrying too much about the threaded nature. This usually means creating an interface that the threading code uses to drive the real code. You can then test the real code in isolation.

Thridly you can test where the threaded code interacts with the main body of code. This means writing a mock for the interface that you developed to separate the two blocks of code. By now the threading code is likely much simpler and you can then often place synchronisation objects in the mock that you've made so that you can control the code under test. So, you'd spin up your thread and wait for it to set an event by calling into your mock and then have it block on another event which your test code controls. The test code can then step the threaded code from one point in your interface to the next.

Finally (if you've decoupled things enough that you can do the earlier stuff then this is easy) you can then run larger pieces of the multi-threaded parts of the app under test and make sure you get the results that you expect; you can play with the priority of the threads and maybe even add a couple of test threads that simply eat CPU to stir things up a bit.

Now you run all of these tests many many times on different hardware...

I've also found that running the tests (or the app) under something like DevPartner BoundsChecker can help a lot as it messes with the thread scheduling such that it sometimes shakes out hard to find bugs. I also wrote a deadlock detection tool which checks for lock inversions during program execution but I only use that rarely.

You can see an example of how I test multi-threaded C++ code here: http://www.lenholgate.com/archives/000306.html

Len Holgate
+2  A: 

Some suggestions:

  • Utilize the law of large numbers and perform the operation under test not only once, but many times.
  • Stress-test your code by exaggerating the scenarios. E.g. to test your mutex-holding class, use scenarios where the mutex-protected code:
    • is very short and fast (a single instruction)
    • is time-consuming (Sleep with a large value)
    • contains explicit context switches (Sleep (0))
  • Run your test on various different architectures. (Even if your software is Windows-only, test it on single- and multicore processors with and without hyperthreading, and a wide range of clock speeds)
  • Try to design your code such that most of it is not exposed to multithreading issues. E.g. instead of accessing shared data (which requires locking or very carefully designed lock-avoidance techniques), let your worker threads operate on copies of the data, and communicate with them using queues. Then you only have to test your queue class for thread-safety
  • Run your tests when the system is idle as well as when it is under load from other tasks (e.g. our build server frequently runs multiple builds in parallel. This alone revealed many multithreading bugs that happened when the system was under load.)
  • Avoid asserting on timeouts. If such an assert fails, you don't know whether the code is broken or whether the timeout was too short. Instead, use a very generous timeout (just to ensure that the test eventually fails). If you want to test that an operation doesn't take longer than a certain time, measure the duration, but don't use a timeout for this.
oefe
The first technique you mention works very well for me in practice. I increase the concurrency by a factor of 20 to 100 and run my apps. Combined with Design By Contract programming, I manage to smoke out most of the concurrency issues. I also do a full inspection of the multi-threaded code since there are subtle bugs that surface very very rarely.
trshiv
+1  A: 

As Len Holgate mentions, I would suggest refactoring (if needed) and creating interfaces for the parts of the code where different threads interact with objects carrying a state. These parts of the code can then be tested separate from the code containing the actual functionality. To verify such a unit test, I would consider using a code coverage tool (I use gcov and lcov for this) to verify that everything in the thread safe interface is covered.

I think this is a pretty convenient way of verifying that new code is covered in the tests. The next step is then to follow the advice of the other answers regarding how to run the tests.

Greget
+1  A: 

Looks like you are using Microsoft tools. There's a group at Microsoft Research that has been working on a tool specifically designed to shake out concurrency bugz. Check out CHESS. Other research projects, in their early stages, are Cuzz and Featherlite.

VS2010 includes a very good looking concurrency profiler, video is available here.

Hans Passant
+1  A: 

Firstly, many thanks for the responses. For the responses posted across different forumes see;

http://www.sqaforums.com/showflat.php?Cat=0&Number=617621&an=0&page=0#Post617621

http://stackoverflow.com/questions/2437763/testing-approach-for-multi-threaded-software

http://www.softwaretestingclub.com/forum/topics/testing-approach-for?xg_source=activity

and the following mailing list; [email protected]

The testing took significantly longer than expected, hence this late reply, leading me to the conclusion that adding multi-threading to existing apps is liable to be very expensive in terms of testing, even if the coding is quite straightforward. This could prove interesting for the SQA community, as there is increasingly more multi-threaded development going on out there.

As per Joe Strazzere's advice, I found the most effective way of hitting bugs was via automation with varied input. I ended up doing this on three PCs, which have ran a bank of tests repeatedly with varied input over about six weeks. Initially, I was seeing crashes one or two times per PC per day. As I tracked these down, it ended up with one or two per week between the three PCs, and we haven't had any further problems for the last two weeks. For the last two weeks we have also had a version with users beta testing, and are using the software in-house.

In addition to varying the input under automation, I also got good results from the following;

  • Adding a test option that allowed mutex time-outs to be read from a configuration file, which in turn could be controlled by my automation.

  • Extending mutex time-outs beyond the typical time expected to execute a section of thread code, and firing a debug exception on time-out.

  • Running the automation in conjunction with a debugger (VS2008) such that when a problem occurred there was a better chance of tracking it down.

  • Running without a debugger to ensure that the debugger was not hiding other timing related bugs.

  • Running the automation against normal release, debug, and fully optimised build. FWIW, the optimised build threw up errors not reproducible in the other builds.

The type of bugs uncovered tended to be serious in nature, e.g. dereferencing invalid pointers, and even under the debugger took quite a bit of tracking down. As has been discussed elsewhere, the SuspendThread and ResumeThread functions ended up being major culprits, and all use of these functions were replaced by mutexes. Similarly all critical sections were removed due to lack of time-outs. Closing documents and exiting the program were also a bug source, where in one instance a document was destroyed with a worker thread still active. To overcome this a single mutex was added per thread to control the life of the thread, and aquired by the document destructor to ensure the thread had terminated as expected.

Once again, many thanks for the all the detailed and varied responses. Next time I take on this type of activity, I'll be better prepared.

Shane MacLaughlin