ansaurus

Question

Is it a bad practice to randomly-generate test data?

Answer 1

A:

One problem with randomly generated test cases is that validating the answer should be computed by code and you can't be sure it doesn't have bugs :)

Mehrdad Afshari 2009-03-11 21:06:19

Answer 2

A:

You might also see this topic: Testing with random inputs best practices.

Jason Baker 2009-03-11 21:08:12

Answer 3

+5 A:

We thought about this a lot on a recent project of mine. In the end, we settled on two points:

Repeatability of test cases is of paramount importance. If you must write a random test, be prepared to document it extensively, because if/when it fails, you will need to know exactly why.
Using randomness as a crutch for code coverage means you either don't have good coverage or you don't understand the domain enough to know what constitutes representative test cases. Figure out which is true and fix it accordingly.

In sum, randomness can often be more trouble than it's worth. Consider carefully whether you're going to be using it correctly before you pull the trigger. We ultimately decided that random test cases were a bad idea in general and to be used sparingly, if at all.

John Feminella 2009-03-11 21:11:31

Answer 4

+9 A:

I'm surprised no one in this topic or in the one Jason Baker linked to mentioned Monte Carlo Testing. That's the only time I've extensively used randomized test inputs. However, it was very important to make the test reproducible, by having a constant seed for the random number generator for each test case.

KeyserSoze 2009-03-11 21:15:09

+1 for the reproducible comment. Controlling the random generator's initial state is very important. If you find a weird behavior, you're going to want to try it again.

Jason S 2009-03-11 21:54:56

another +1 for the reproducible.

peterchen 2009-03-12 08:58:43

Answer 5

+1 A:

Lots of good information has already been posted, but see also: Fuzz Testing. Word on the street is that Microsoft uses this approach on a lot of their projects.

MattK 2009-03-11 21:16:29

I'm glad someone brought this up. Fuzz Testing is hugely useful, but note that random testing should be *in addition* to repeatable tests.

vasi 2009-03-12 08:50:11

Answer 6

A:

My experience with testing is mostly with simple programs written in C/Python/Java, so I'm not sure if this is entirely applicable, but whenever I have a program that can accept any sort of user input, I always include a test with random input data, or at least input data generated by the computer in an unpredictable way, because you can never make assumptions about what users will enter. Or, well, you can, but if you do then some hacker who doesn't make that assumption may well find a bug that you totally overlooked. Machine-generated input is the best (only?) way I know of to keep human bias completely out of the testing procedures. Of course, in order to reproduce a failed test you have to do something like saving the test input to a file or printing it out (if it's text) before running the test.

David Zaslavsky 2009-03-11 21:50:20

Answer 7

+1 A:

Random testing is a bad practice a long as you don't have a solution for the oracle problem, i.e., determining which is the expected outcome of your software given its input.

If you solved the oracle problem, you can get one step further than simple random input generation. You can choose input distributions such that specific parts of your software get exercised more than with simple random.

You then switch from random testing to statistical testing.

if (a > 0)
    // Do Foo
else (if b < 0)
    // Do Bar
else
    // Do Foobar

If you select a and b randomly in int range, you exercise Foo 50% of the time, Bar 25% of the time and Foobar 25% of the time. It is likely that you will find more bugs in Foo than in Bar or Foobar.

If you select a such that it is negative 66.66% of the time, Bar and Foobar get exercised more than with your first distribution. Indeed the three branches get exercised each 33.33% of the time.

Of course, if your observed outcome is different than your expected outcome, you have to log everything that can be useful to reproduce the bug.

mouviciel 2009-03-11 21:50:48

Answer 8

A:

I would suggest having a look at Machinist:

http://github.com/notahat/machinist/tree/master

Machinist will generate data for you, but it is repeatable, so each test-run has the same random data.

You could do something similar by seeding the random number generator consistently.

Toby Hede 2009-03-12 02:57:36

Do you need to have ActiveRecord/Rails in order to use machinist?

Andrew Grimm 2009-03-12 11:26:08

I believe it does depend on ActiveRecord, but you can use it outside of Rails.

Toby Hede 2009-03-13 23:27:01

Answer 9

+1 A:

This is an answer to your second point:

(2) I use testing to as a form of documentation for the code. If I have hard-coded fixture values, it's hard to reveal what a particular test is trying to demonstrate.

I agree. Ideally spec examples should be understandable by themselves. Using fixtures is problematic, because it splits the pre-conditions of the example from its expected results.

Because of this, many RSpec users have stopped using fixtures altogether. Instead, construct the needed objects in the spec example itself.

describe Item, "#most_expensive" do
  it 'should return the most expensive item' do
    items = [
      Item.create!(:price => 100),
      Item.create!(:price => 50)
    ]

    Item.most_expensive.price.should == 100
  end
end

If your end up with lots of boilerplate code for object creation, you should take a look at some of the many test object factory libraries, such as factory_girl, Machinist, or FixtureReplacement.

Antti Tarvainen 2009-03-12 07:39:19

Is the FixtureReplacement link broken?

Andrew Grimm 2009-03-12 11:28:10

Lots of excellent answers, but this one cut to the chase -- there's a better way to do what I want to do, and my test data doesn't have to be 'random' anymore.

bobocopy 2009-03-12 16:09:47

bobocopy: It seems so. Odd, I think it was working yesterday. It's fixed now.

Antti Tarvainen 2009-03-13 18:03:42

Answer 10

A:

Effectiveness of such testing largely depends on quality of random number generator you use and on how correct is the code that translates RNG's output into test data.

If the RNG never produces values causing your code to get into some edge case condition you will not have this case covered. If your code that translates the RNG's output into input of the code you test is defective it may happen that even with a good generator you still don't hit all the edge cases.

How will you test for that?

sharptooth 2009-03-12 07:53:31

Answer 11

A:

The problem with randomness in test cases is that the output is, well, random.

The idea behind tests (especially regression tests) is to check that nothing is broken.

If you find something that is broken, you need to include that test every time from then on, otherwise you won't have a consistent set of tests. Also, if you run a random test that works, then you need to include that test, because its possible that you may break the code so that the test fails.

In other words, if you have a test which uses random data generated on the fly, I think this is a bad idea. If however, you use a set of random data, WHICH YOU THEN STORE AND REUSE, this may be a good idea. This could take the form of a set of seeds for a random number generator.

This storing of the generated data allows you to find the 'correct' response to this data.

So, I would recommend using random data to explore your system, but use defined data in your tests (which may have originally been randomly generated data)

MatthieuF 2009-03-12 08:13:29

ansaurus

tags:

views:

answers:

Is it a bad practice to randomly-generate test data?

related questions