views:

264

answers:

8

NOTE: I mention the next couple of paragraphs as background. If you just want a TL;DR, feel free to skip down to the numbered questions as they are only indirectly related to this info.

I'm currently writing a python script that does some stuff with POSIX dates (among other things). Unit testing these seems a little bit difficult though, since there's such a wide range of dates and times that can be encountered.

Of course, it's impractical for me to try to test every single date/time combination possible, so I think I'm going to try a unit test that randomizes the inputs and then reports what the inputs were if the test failed. Statisically speaking, I figure that I can achieve a bit more completeness of testing than I could if I tried to think of all potential problem areas (due to missing things) or testing all cases (due to sheer infeasability), assuming that I run it enough times.

So here are a few questions (mainly indirectly related to the above ):

  1. What types of code are good candidates for randomized testing? What types of code aren't?
  2. How do I go about determining the number of times to run the code with randomized inputs? I ask this because I want to have a large enough sample to determine any bugs, but don't want to wait a week to get my results.
  3. Are these kinds of tests well suited for unit tests, or is there another kind of test that it works well with?
  4. Are there any other best practices for doing this kind of thing?

Related topics:

+4  A: 

With respect to the 3rd question, in my opinion random tests are not well suited for unit testing. If applied to the same piece of code, a unit test should succeed always, or fail always (i.e., wrong behavior due to bugs should be reproducible). You could however use random techniques to generate a large data set, then use that data set within your unit tests; there's nothing wrong with it.

Federico Ramponi
+2  A: 

Wow, great question! Some thoughts:

  • Random testing is always a good confidence building activity, though as you mentioned, it's best suited to certain types of code.
  • It's an excellent way to stress-test any code whose performance may be related to the number of times it's been executed, or to the sequence of inputs.
  • For fairly simple code, or code that expects a limited type of input, I'd prefer systematic test that explicitly cover all of the likely cases, samples of each unlikely or pathological case, and all the boundary conditions.
Adam Liss
Good points. +1. You should indeed make a clear difference between random values and limit values: they are not the same.
VonC
Thanks. Are you stalking me? Or am I stalking you? Or is it just random chance? :-)
Adam Liss
Errr... no stalking involved ;). We simply seem to have common programing interests. For that to happen on StackOverflow should violate your "principle of least astonishment" :)
VonC
+8  A: 

I agree with Federico - randomised testing is counterproductive. If a test won't reliably pass or fail, it's very hard to fix it and know it's fixed. (This is also a problem when you introduce an unreliable dependency, of course.)

Instead, however, you might like to make sure you've got good data coverage in other ways. For instance:

  • Make sure you have tests for the start, middle and end of every month of every year between 1900 and 2100 (if those are suitable for your code, of course).
  • Use a variety of cultures, or "all of them" if that's known.
  • Try "day 0" and "one day after the end of each month" etc.

In short, still try a lot of values, but do so programmatically and repeatably. You don't need every value you try to be a literal in a test - it's fine to loop round all known values for one axis of your testing, etc.

You'll never get complete coverage, but it will at least be repeatable.

EDIT: I'm sure there are places where random tests are useful, although probably not for unit tests. However, in this case I'd like to suggest something: use one RNG to create a random but known seed, and then seed a new RNG with that value - and log it. That way if something interesting happens you will be able to reproduce it by starting an RNG with the logged seed.

Jon Skeet
You make some good points. One thing though: do you mean that randomized UNIT testing is counterproductive, or that it's counterproductive for all tests period? Because I refuse to buy the last one. I would however buy that it's counterproductive for ALMOST all tests. :-)
Jason Baker
I think we agree. It's a very rare case that does well through randomization, IMO. Editing for one particular point though...
Jon Skeet
That's excellent; I hadn't thought of that approach.
Jason Baker
I've used the 'random' test but using a know seed to make issues repeatable. Worked well.
mavnn
+1  A: 

Q1) I found that distributed systems with lots of concurrency are good candidates for randomized testing. It is hard to create all possible scenarios for such applications, but random testing can expose problems that you never thought about.

Q2) I guess you could try to use statistics to build an confidence interval around having discovered all "bugs". But the practical answer is: run your randomized tests as many times as you can afford.

Q3) I have found that randomized testing is useful but after you have written the normal battery of unit, integration and regression tests. You should integrate your randomized tests as part of the normal test suite, though probably a small run. If nothing else, you avoid bit rot in the tests themselves, and get some modicum coverage as the team runs the tests with different random inputs.

Q4) When writing randomized tests, make sure you save the random seed with the results of the tests. There is nothing more frustrating than finding that your random tests caught a bug, and not being able to run the test again with the same input. Make sure your test can either be executed with the saved seed too.

coryan
+1  A: 

A few things:

  • With random testing, you can't really tell how good a piece of code is, but you can tell how bad it is.
  • Random testing is better suited for things that have random inputs -- a prime example is anything that's exposed to users. So, for example, something that randomly clicks & types all over your app (or OS) is a good test of general robustness.
  • Similarly, developers count as users. So something that randomly assembles a GUI from your framework is another good candidate.
  • Again, you're not going to find all the bugs this way -- what you're looking for is "if I do a million whacky things, do ANY of them result in system corruption?" If not, you can feel some level of confidence that your app/OS/SDK/whatever might hold up to a few days' exposure to users.
  • ...But, more importantly, if your random-beater-upper test app can crash your app/OS/SDK in about 5 minutes, that's about how long you'll have until the first fire-drill if you try to ship that sucker.

Also note: REPRODUCIBILITY IS IMPORTANT IN TESTING! Hence, have your test-tool log the random-seed that it used, and have a parameter to start with the same seed. In addition, have it either start from a known "base state" (i.e., reinstall everything from an image on a server & start there) or some recreatable base-state (i.e., reinstall from that image, then alter it according to some random-seed that the test tool takes as a parameter.)

Of course, the developers will appreciate if the tool has nice things like "save state every 20,000 events" and "stop right before event #" and "step forward 1/10/100 events." This will greatly aid them in reproducing the problem, finding and fixing it.

As someone else pointed out, servers are another thing exposed to users. Get yourself a list of 1,000,000 URLs (grep from server logs), then feed them to your random number generator.

And remember: "system went 24 hours of random pounding without errors" does not mean it's ready to ship, it just means it's stable enough to start some serious testing. Before it can do that, QA should feel free to say "look, your POS can't even last 24 hours under life-like random user simulation -- you fix that, I'm going to spend some time writing better tools."

Oh yeah, one last thing: in addition to the "pound it as fast & hard as you can" tests, have the ability to do "exactly what a real user [who was perhaps deranged, or a baby bounding the keyboard/mouse] would do." That is, if you're doing random user-events; do them at the speed that a very-fast typist or very-fast mouse-user could do (with occasional delays, to simulate a SLOW person), in addition to "as fast as my program can spit-out events." These are two *very different types of tests, and will get very different reactions when bugs are found.

Olie
A: 

To make tests reproducible, simply use a fixed seed start value. That ensures the same data is used whenever the test runs. Tests will reliably pass or fail.

  • Good / bad candidates? Randomized tests are good at finding edge cases (exceptions). A problem is to define the correct result of a randomized input.
  • Determining the number of times to run the code: Simply try it out, if it takes too long reduce the iteration count. You may want to use a code coverage tool to find out what part of your application is actually tested.
  • Are these kinds of tests well suited for unit tests? Yes.
A: 

This might be slightly off-topic, but if you're using .net, there is Pex, which does something similar to randomized testing, but with more intuition by attempting to generate a "random" test case that exercises all of the paths through your code.

FryGuy
A: 

Here is my answer to a similar question: Is it a bad practice to randomly-generate test data?. Other answers may be useful as well.

Random testing is a bad practice a long as you don't have a solution for the oracle problem, i.e., determining which is the expected outcome of your software given its input.

If you solved the oracle problem, you can get one step further than simple random input generation. You can choose input distributions such that specific parts of your software get exercised more than with simple random.

You then switch from random testing to statistical testing.

if (a > 0)
    // Do Foo
else (if b < 0)
    // Do Bar
else
    // Do Foobar

If you select a and b randomly in int range, you exercise Foo 50% of the time, Bar 25% of the time and Foobar 25% of the time. It is likely that you will find more bugs in Foo than in Bar or Foobar.

If you select a such that it is negative 66.66% of the time, Bar and Foobar get exercised more than with your first distribution. Indeed the three branches get exercised each 33.33% of the time.

Of course, if your observed outcome is different than your expected outcome, you have to log everything that can be useful to reproduce the bug.

mouviciel