views:

197

answers:

3
  • A website has a database of n questions.
  • You click a button and are shown one random question per click. The probability of a particular question showing up at the click event is 1/n.

On average, how many clicks would be required to see all the questions in the database?

What is the approach required for such questions?

+13  A: 

See coupon collector's problem.

sdcvvc
+4  A: 

Why don't we run a simulation and found out?

<?php

function simulate($size) {

    $range = range(1, $size);

    $hits = 0;
    $hit = array();

    while(count($hit) != $size) {
        $rand = array_rand($range);
        $hit[$rand] = 1;
        $hits++;
    }

    return $hits;

}

for ($j=10; $j<101; $j+=10) {
    $res = array();
    for ($i=0; $i<10; $i++) {
        $res[] = simulate($j);
    }
    echo "for size=$j, avg=" . array_sum($res)/10 . "<br />";
}

Output:

for size=10, avg=35.9
for size=20, avg=68.8
for size=30, avg=123.3
for size=40, avg=176.9
for size=50, avg=205.9
for size=60, avg=276.8
for size=70, avg=304.9
for size=80, avg=401.9
for size=90, avg=371
for size=100, avg=617.7
quantumSoup
+1 for naming a PHP variable $hit
Tom Gullen
mmm Do I forget something or not? does $hit a special meaning on php?
Marcx
Marcx: consider what English letter $ looks like.+1 from me also.
Peter
+9  A: 

This is a mathematical question rather than an algorithmic one. As sdcvvc said, this is the famous coupon collector's problem.

Suppose you have n questions to "gather". Let X be a random variable denoting the required number of clicks. If we define Xi to be the number of clicks from the moment we have (i-1) questions to the moment we have i questions, then:

X = X1 + X2 + ... + Xn

Due to the linearity of the expected value:

E(X) = E(X1 + X2 + ... + Xn) = EX1 + EX2 + ... + EXn

If we inspect Xi, we see that in fact it has geometric distribution with p=(n-i+1)/n, hence a mean value of n/(n-i+1). Therefore:

EX = n * (1/n + 1/(n-1) + ... + 1/2 + 1/1) = n * Hn

Where Hn is the nth Harmonic number, which can be approximated by ln n:

EX ~= n * ln n

You can run a simple simulation and test this approximation.

Eyal Schneider