views:

29

answers:

1

I have a model where state j among M states is chosen with probability p_j. The probability could be any real number. This specifies a mixture model over the M states. I can access p_j for all j in constant time. I want to make a large number (N) of random samples. The most obvious algorithm is

1) Compute the cumulative probability distribution P_j = p_1+p_2+...p_j. O(M)

2) For each sample choose random float x in [0,1]. O(N)

3) For each sample choose j such that min(0,P_j-1) < x <= max(1,P_j). O(Nlog(M))

So the asymptotic complexity is O(Nlog(M)). The factor of N is obviously unavoidable, but I am wondering about log(M). Is it possible to beat this factor in a realistic implementation?

A: 

I think you can do better using something like the following algorithm, or any other reasonable Multinomial distribution sampler,

// Normalize p_j
for j = 1 to M
   p_hat[j] = p[j] / P_j

// Place the draws from the mixture model in this array
draws = [];

// Sample until we have N iid samples 
cdf = 1.0;   
for ( j = 1, remaining = N; j <= M && remaining > 0; j++ )
{
   // p_hat[j] is the probability of sampling item j and there
   // are (N - count) items remaining to sample.  This is just
   // (N - count) Bernoulli trials, so draw from a 
   // Binomial(N - count, p_hat[j / cdf) distribution to get the
   // number of items       
   //
   // Adjusting the probability by 1 - CDF ensures that *something*
   // is sampled because p_hat[M] / cdf = p_hat[M] / p_hat[M] = 1.0
   items = Binomial.sample( remaining, p_hat[j] / cdf );
   remaining -= items;
   cdf -= p_hat[j];

   for ( k = 0; k < items; k++ )
      draws.push( sample_from_mixture_component( j ))         
}

This should take close to O(N) time but it does depend on how efficient your Binomial distribution and mixture model component samplers are.

Lucas