What is the appropriate data structure for this task?
I have a set of N items. N is large. Each item has a positive weight value associated with it.
I would like to do the following, quickly:
inner loop:
Sample an item, according it its weight.
[process...]
Update the weight of K items, where K << N.
When I say sample by weight, this is different than uniform sampling. An item's likelihood is in proportion to its weight. So if there are two items, and one has weight .8 and one has weight .2, then they have likelihood 80% and 20% respectively.
The number of items N remains fixed. Weights are in a bounded range, say [0, 1]. Weights do not always sum to one.
A naive approach takes O(n) time steps to sample. Is there an O(log(n)) algorithm?
What is the appropriate data structure for this task? I believe that red-black trees are inappropriate, since they treat each item as having equal weight.