Given a series of incoming items, I want to assign each one to a bucket as it comes in. The bucket can either be a new one (one that has never been used before, of which there are an infinite supply) or it can be an existing bucket. If I look at the number of buckets with one item, the number with two, the number with three, etc., I want those bucket counts to follow an exponential distribution. I hope I'm saying that correctly -- if 80% of the buckets have 1 item, then 16% should have two, 3.2% should have three, etc. In general, the number of buckets of size k should be 1/p as many as the number of buckets of size k-1, and the fraction of buckets that are size 1 should be (1-p) to make the math work out right.
If I knew the number of items in advance, it would be easy: I'd know how many buckets of each size, so I can just pile the the items into as many buckets of each size as I needed. Or if I were just generating bucket sizes, it'd be easy even if I didn't know the total in advance: each new bucket has probability (1-p) of being size 1, (1-p)p of being size 2, (1-p)p^2 size 3, (1-p)p^3 size 4, etc.
But I am processing items, so as I get an item in, I need to choose a bucket: either an existing one, or a new one. If I make a new bucket, then I get one more of size 1. But if I choose an existing one of size k, then I get one more bucket of size k+1, and one fewer of size k. So what should the probability of choosing a bucket of size k be (where k can be zero to indicate making a new bucket)? And how does it relate to p?
I am doing this to generate test data where I only know the mean number of items per bucket (1.5, it turns out), don't know the number of items in advance, and think that the exponential distribution of bucket sizes is realistic for my situation. I'm probably making it more complicated than it needs to be, but I was surprised because I thought it would be easy.
And maybe it is, but I'm just missing something. (And I can't figure out how to Google it, either.)