views:

105

answers:

2

I'm looking for a tool that will let me generate a data set with certain statistical properties. For example, suppose I want to generate 1 million integers with x number of outliers for use in testing.

Are there any tools for generating test data sets like this? I don't necessarily need anything fancy, just some basic functionality.

+3  A: 

The easiest technique, at least the easiest to understand, mathematically, is the accept-reject algorithm algorithm.

mjv
+1  A: 

Math from apache commons has some tools you can use for generating data from simple probability distributions. It is actually pretty easy to roll your own variant of these generation functions using the random() functionality of whatever system you're using. Assuming random() returns a uniformly distributed random number between 0 and 1 you just pass that through the inverse cumulative distribution function of whatever distribution you need to get the random numbers you need. If you need something very fancy you can use Markov Chains.

jilles de wit