1. Construct a CDF-like list like this:
def build_cdf(distrib):
cdf = []
val = 0
for key, freq in distrib.items():
val += freq
cdf.append((val, key))
return (val, cdf)
This function returns a tuple, the 1st value is the sum of probabilities, and 2nd value is the CDF.
2. Construct the sampler like this:
import random
def sample_from_cdf(val_and_cdf):
(val, cdf) = val_and_cdf;
rand = random.uniform(0, val)
# use bisect.bisect_left to reduce search time from O(n) to O(log n).
return [key for index, key in cdf if index > rand][0]
Usage:
x = build_cdf({"a":0.2, "b":0.3, "c":0.5});
y = [sample_from_cdf(x) for i in range(0,100000)];
print (len([t for t in y if t == "a"])) # 19864
print (len([t for t in y if t == "b"])) # 29760
print (len([t for t in y if t == "c"])) # 50376
You may want to make this into a class.