ansaurus

Question

Probability problem - Duplicates when choosing from large basket.

Answer 1

A:

Its probably higher than you think. I won't attempt to duplicate this article: http://en.wikipedia.org/wiki/Birthday_paradox

Chris 2008-09-25 22:02:46

Please use [link text](URL) to create a clickable link.

cjm 2008-09-25 22:05:27

done, hit code button rather than hyperlink :S

Chris 2008-09-25 22:06:08

I looked at that, and it's great for finding the probability of a single dupe, it's a bit tougher to come up with the dupe probability distribution.

chris 2008-09-25 22:06:54

Answer 2

+1 A:

Once you've created the first exam, there are 92 questions that have never been used, and 100 that have. If you now generate another exam, with 100 questions in in it, you are chosing from a set of 92 questions that have never been used, and 100 that have. Clearly you are going to get quite a few duplicates.

You would expect to get (100/192) * 100 duplicates, i.e. in any two randomly chosen exams, there will (on average) be 52 duplicate questions.

If you want the probability that there are 25, or 75, or whatever, then you have two choices.

a) Work out the maths

b) Simulate a few runs on a computer

Airsource Ltd 2008-09-25 22:13:08

You should say that the **expected** number of duplicates is 52.

David Nehme 2008-09-25 22:14:05

indeed. Corrected.

Airsource Ltd 2008-09-25 22:15:35

Answer 3

+2 A:

Erm, this is really really hazy for me. But there are (192 choose 100) possible exams, right?

And there are (100 choose N) ways of picking N dupes, each with (92 choose 100-N) ways of picking the rest of the questions, no?

So isn't the probability of picking N dupes just:

(100 choose N) * (92 choose 100-N) / (192 choose 100)

EDIT: So if you want the chances of N or more dupes instead of exactly N, you have to sum the top half of that fraction for all values of N from the minimum number of dupes up to 100.

Errrr, maybe...

2008-09-25 22:18:33

That looks good! I'll wait for criticism before accepting.

chris 2008-09-25 22:20:20

Looks good to me but that's the probability of exactly N duplicates. To get probability of at least N duplicates - which, I think, is what chris is interested in, one has to sum a bit

Maciej Hehl 2008-09-25 22:35:05

@Maciej: Doh, of course. Thanks

2008-09-25 22:48:57

Yeah, that did it. I figured out the summation on my own.

chris 2008-09-25 22:49:48

ansaurus

tags:

views:

answers:

Probability problem - Duplicates when choosing from large basket.

related questions