ansaurus

Question

Weighting Different Outcomes when Pseudorandomly Choosing from an Arbitrarily Large Sample

Answer 1

+1 A:

If you know the probability of each event happening you need to map these probabilities to the range 0-100 (or 0 to 1 if you want to use real numbers and probabilities.)

So in the example above there are 12 Cs. C1 is 4/12 or ~33%, C2 is 2/12 of ~17%, C3 is 5/12 or ~42%, and C4 is 1/12 or ~8%.

Notice that these all add up to 100%. So if we choose a random number between 0 and 100 we can map C1 to 0-33, C2 to 33-50 (17 more than C1's value) , C3 to 50-92, and C4 to 92-100.

An if statement could make the choice:

r = rand() # between 0-100
if (r <33)
  return "C1"
elsif (r < 50)
  return "C2"
elsif (r < 92)
  return "C3"
elsif (r < 100)
  return "C4"

If you wanted more accuracy than 1 in 100 just go from 1-1000 or whatever range you want. It's probably better form to use integers and scale them rather than use floating point numbers as floating point can have odd behavior if the spread between values gets large.

If you wanted to go the binning route like you show above you could try something like so (in ruby though the idea is more general):

a = ["C1"]*4 + ["C2"]*2 + ["C3"]*5 + ["C4"]
# ["C1", "C1", "C1", "C1", "C2", "C2", 
#  "C3", "C3", "C3", "C3", "C3", "C4"]
a[rand(a.length)] # => "C1' w/ probability 4/12

Binning would be slower as you need to create the array, but easier to add alternatives as you wouldn't need to recalculate the probabilities each time.

You could also generate the above if code from the array representation so you'd just take the pre-processing hit once when the code was generated and then get a fast answer from the created code.

Paul Rubel 2010-09-14 16:49:31

I thought about this approach, but it seemed to me that it would suffer the same scaling issue as my other approach. Assume a sample size of 50,000 objects, each with a unique, non-discrete probability; How large would the range of random numbers have to be?

Andrew 2010-09-14 17:01:19

You're going to have to assign a probability to each object, you can't get around that unless there is some underlying connection between objects, which you could also leverage in your selection code. If you want to bin the items you could also use that as a selection, I edited the question. However, 50k doesn't seem that much, it's going to depend on the precision of your probabilities.

Paul Rubel 2010-09-14 18:54:30

This method works just as easily on floating point as it does on integers. The range of the random number needs to be the sum of the individual item probabilities. See http://stackoverflow.com/questions/3655430/selection-based-on-percentage-weighting/3655542#3655542

Mark Ransom 2010-09-14 19:16:04

ansaurus

tags:

views:

answers:

Weighting Different Outcomes when Pseudorandomly Choosing from an Arbitrarily Large Sample

related questions