views:

30

answers:

1

So, I was sitting in my backyard thinking about Pokemon, as we're all wont to do, and it got me thinking: When you encounter a 'random' Pokemon, some specimen appear a lot more often than others, which means that they're weighted differently than the ones that appear less.

Now, were I to approach the problem of getting the different Pokemon to appear with a certain probability, I would most likely do so by simply increasing the number of entries that certain Pokemon have in the pool of choices (like so),

Pool:
C1 C1 C1 C1
C2 C2
C3 C3 C3 C3 C3
C4

so C1 has a 1/3 chance of being pulled, C2 has a 1/6th chance, etc, but I understand that this may be a very simple and naive approach, and is unlikely to scale well with a large number of choices.

So, my question is this, S/O: Given an arbitrarily large sample size, how would you go about weighting the chance of one outcome as greater than another? And, as a follow up question, assume that you want the probability of certain options to occur in a ratio with floating-point precision as opposed to whole number ratios?

+1  A: 

If you know the probability of each event happening you need to map these probabilities to the range 0-100 (or 0 to 1 if you want to use real numbers and probabilities.)

So in the example above there are 12 Cs. C1 is 4/12 or ~33%, C2 is 2/12 of ~17%, C3 is 5/12 or ~42%, and C4 is 1/12 or ~8%.

Notice that these all add up to 100%. So if we choose a random number between 0 and 100 we can map C1 to 0-33, C2 to 33-50 (17 more than C1's value) , C3 to 50-92, and C4 to 92-100.

An if statement could make the choice:

r = rand() # between 0-100
if (r <33)
  return "C1"
elsif (r < 50)
  return "C2"
elsif (r < 92)
  return "C3"
elsif (r < 100)
  return "C4"

If you wanted more accuracy than 1 in 100 just go from 1-1000 or whatever range you want. It's probably better form to use integers and scale them rather than use floating point numbers as floating point can have odd behavior if the spread between values gets large.

If you wanted to go the binning route like you show above you could try something like so (in ruby though the idea is more general):

a = ["C1"]*4 + ["C2"]*2 + ["C3"]*5 + ["C4"]
# ["C1", "C1", "C1", "C1", "C2", "C2", 
#  "C3", "C3", "C3", "C3", "C3", "C4"]
a[rand(a.length)] # => "C1' w/ probability 4/12

Binning would be slower as you need to create the array, but easier to add alternatives as you wouldn't need to recalculate the probabilities each time.

You could also generate the above if code from the array representation so you'd just take the pre-processing hit once when the code was generated and then get a fast answer from the created code.

Paul Rubel
I thought about this approach, but it seemed to me that it would suffer the same scaling issue as my other approach. Assume a sample size of 50,000 objects, each with a unique, non-discrete probability; How large would the range of random numbers have to be?
Andrew
You're going to have to assign a probability to each object, you can't get around that unless there is some underlying connection between objects, which you could also leverage in your selection code. If you want to bin the items you could also use that as a selection, I edited the question. However, 50k doesn't seem that much, it's going to depend on the precision of your probabilities.
Paul Rubel
This method works just as easily on floating point as it does on integers. The range of the random number needs to be the sum of the individual item probabilities. See http://stackoverflow.com/questions/3655430/selection-based-on-percentage-weighting/3655542#3655542
Mark Ransom