I need to explain to the client why dupes are showing up between 2 supposedly different exams. It's been 20 years since Prob and Stats.
I have a generated Multiple choice exam. There are 192 questions in the database, 100 are chosen at random (no dupes).
Obviously, there is a 100% chance of there being at least 8 dupes between any two exams so generated. (Pigeonhole principle)
How do I calculate the probability of there being 25 dupes? 50 dupes? 75 dupes?
-- Edit after the fact -- I ran this through excel, taking sums of the probabilities from n-100, For this particular problem, the probabilities were
n P(n+ dupes)
40 97.5%
52 ~50%
61 ~0