This MSDN article proves the correctness of Reservoir Sampling algorithm as follows:
Base case is trivial. For the k+1st case, the probability a given element i with position <= k is in R is s/k.
The probability i is replaced is the probability k+1st element is chosen multiplied by i being chosen to be replaced, which is: s/(k+1) * 1/s = 1/(k+1), and prob that i is not replaced is k/k+1.
So any given element's probability of lasting after k+1 rounds is: (chosen in k steps, and not removed in k steps) = s/k * k/(k+1), which is s/(k+1).
So, when k+1 = n, any element is present with probability s/n.
about step 3:
What are the
k+1 rounds
mentioned?What is
chosen in k steps, and not removed in k steps
?Why are we only calculating this probability for elements that were already in
R
after the firsts
steps?