views:

1548

answers:

7

Assume a game in which one rolls 20, 8-sided die, for a total number of 8^20 possible outcomes. To calculate the probability of a particular event occurring, we divide the number of ways that event can occur by 8^20.

One can calculate the number of ways to get exactly 5 dice of the value 3. (20 choose 5) gives us the number of orders of 3. 7^15 gives us the number of ways we can not get the value 3 for 15 rolls.

number of ways to get exactly 5, 3's = (20 choose 5)*7^15.

The answer can also be viewed as how many ways can I rearrange the string 3,3,3,3,3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 (20 choose 5) times the total number of values we the zero's (assuming 7 legal values) 7^15 (is this correct).

  • Question 1: How can I calculate the number of ways to get exactly 5 dice of the same value(That is, for all die values). Note: if I just naively use my first answer above and multiply bt 8, I get an enormous amount of double counting?

    I understand that I could solve for each of the cases (5 1's), (5, 2's), (5, 3's), ... (5's, 8) sum them (more simply 8*(5 1's) ). Then subtract the sum of number of overlaps (5 1's) and (5 2's), (5 1's) and (5 3's)... (5 1's) and (5, 2's) and ... and (5, 8's) but this seems exceedingly messy. I would a generalization of this in a way that scales up to large numbers of samples and large numbers of classes.

  • How can I calculate the number of ways to get at least 5 dice of the same value?

    So 111110000000000000000 or 11110100000000000002 or 11111100000001110000 or 11011211222222223333, but not 00001111222233334444 or 000511512252363347744.

I'm looking for answers which either explain the math or point to a library which supports this (esp python modules). Extra points for detail and examples.

+2  A: 
Robert Harvey
+1 nice link, Unfortunately I am not looking for the sum of the dice, but rather number of repetitions of values in successive dice rolls. That is, I roll a die 10 times and that is the probability I will get at least 3 dice of value 5.
e5
+1  A: 

Recursive solution:

Prob_same_value(n) = Prob_same_value(n-1) * (1 - Prob_noone_rolling_that_value(N-(n-1)))
mbeckish
Recursive is neat, but often escapes me I'm not sure I understand how your algorithm works, what is n? What does Prob_same_value(2) provide? 1/8?
e5
Yeah, I just kind of threw this out there without a lot of details. Basically, I'm saying that, to get at least five of the same value, you need to get 4 of the same value, and you need to NOT have none of the other dice match those 4.
mbeckish
n is how many dice must roll the same value. So, Prob_same_value(1) = Prob_same_value(2) = Prob_same_value(3) = 1. After that, solve recursively.
mbeckish
Not counting sequences which the 5 repetitions on out of order. It doesn't count sequences which are out of order, so it counts (11111000000000000000,11111000000000000002, 11111000000000000003, ... 11111888888888888888, 22222000000000000000, 22222000000000000002, ... 22222888888888888888), but misses sequences like (00111110000000000000,00001100100110000000, 02020202020000000000).
e5
@e5 - Nope, no mention of order here. To get ANY 5 to match, you must get ANY 4 to match, and get at least 1 more die to match.
mbeckish
@mbeckish, (1/8) implies order since the probability for a set of dice rolls is not (1/8)^rolls if order doesn't count.
e5
Where is the 1/8?
mbeckish
Good point, I got this question confused with one of the other ones, let think about this some more.
e5
@mbeckish - I thought your answer some more and I get your intuition now, which I might add is really quite clever. This type of recursion lends itself to a dynamic programming solution (yay efficiency) and/or memory tradeoff. I'm not sure what is going on with Prob_noone_rolling_that_value(N-(n-1))? What is N's relationship to n? A type-o? Shouldn't (n-(n-1)) = 1, for all n? Is N the initial rather than recursive value of n? Did you mean(n!-(n-1))? Regardless, great answer, if only I had but more than one vote to give.
e5
+1  A: 

Here is what I am thinking...

If you just had 5 dice, you would only have eight ways to get what you want.

For each of those eight ways, all possible combinations of the other 15 dice work.

So - I think the answer is: (8 * 8**15) / 8**20

(The answer for at least 5 the same.)

Anon
It doesn't count sequences which are out of order, so it counts (11111000000000000000,11111000000000000002, 11111000000000000003, ... 11111888888888888888, 22222000000000000000, 22222000000000000002, ... 22222888888888888888), but misses sequences like (00111110000000000000,00001100100110000000, 02020202020000000000). Also note that as the sample size increases the the probability of a collision remains fixed (8 * (8^15)) / (8^20) = 0.000244140625, (8 * (8^25)) / (8^30) = 0.000244140625, (8 * (8^95)) / (8^100) = 0.000244140625.
e5
@e5: I agree my numerator doesn't count different orderings - but are you sure your denominator does? Consider two dice rolled instead of 20. If you wanted to count ordering, would it be 8**2 or instead (2 * 8**2) to account for [red_die, blue_die] vs. [blue_die, red_die] ?
Anon
@Anon, interesting point. I think I should be ok, tho I am identifying my dice by order, so red_die is always rolled first, blue_die second and so on.
e5
@e5: Regardless, your sample size observation disproved my answer. Rolling 33 8-sided dice would mean there has to be at least 5 of at least one number.
Anon
+3  A: 

Double counting can be solved by use of the Inclusion/Exclusion Principle

I suspect it comes out to:

Choose(8,1)*P(one set of 5 Xs) 
- Choose(8,2)*P(a set of 5 Xs and a set of 5 Ys) 
+ Choose(8,3)*P(5 Xs, 5 Ys, 5 Zs) 
- Choose(8,4)*P(5 Xs, 5 Ys, 5 Zs, 5 As)

P(set of 5 Xs) = 20 Choose 5 * 7^15 / 8^20
P(5 Xs, 5 Ys) = 20 Choose 5,5 * 6^10 / 8^20

And so on. This doesn't solve the problem directly of 'more then 5 of the same', as if you simply summed the results of this applied to 5,6,7..20; you would over count the cases where you have, say, 10 1's and 5 8's.

You could probably apply inclusion exclusion again to come up with that second answer; so, P(of at least 5)=P(one set of 20)+ ... + (P(one set of 15) - 7*P(set of 5 from 5 dice)) + ((P(one set of 14) - 7*P(one set of 5 from 6) - 7*P(one set of 6 from 6)). Coming up with the source code for that is proving itself more difficult.

CoderTao
+1  A: 

I believe you can use the formula of x occurrences in n events as:

P = probability^n * (n!/((n - x)!x!))

So the final result is going to be the sum of results from 0 to n.

I don't really see any easy way to combine it into one step that would be less messy. With this way you have the formula spelled out in the code as well. You may have to write your own factorial method though.

  float calculateProbability(int tosses, int atLeastNumber) {
    float atLeastProbability = 0;
    float eventProbability = Math.pow( 1.0/8.0, tosses);
    int nFactorial = factorial(tosses);

    for ( i = 1; i <= atLeastNumber; i++) {
      atLeastProbability += eventProbability * (nFactorial / (factorial(tosses - i) * factorial(i) );
    }
  }
JonBWalsh
+1 math, details and source code. Your solution doesn't work because only sequences like 11111000000000000 and 111000000001100 are counted. That is 11111000000000000 and 11111000000000002 are different sequences and need to each be counted. For example if you take 40 samples(rolls), you are must have atleast one repetition of 5(worst case). The formula you use has does not show this.
e5
I don't think I follow your comment. The weight of 1/8 for a positive event (rolling a particular number) covers this.In the two situations of possible outcomes A:123456and100000are functionally the same if we don't care about the 'non-1' values.Also I don't think the OP is asking for the probability of having at least X die have the same value of ANY value (which is different). My understanding is that he wants to know the probability of at least x many die having y value.
JonBWalsh
Lol nm I see that you ARE the OP.If you mean to ask, "What's the probability of having at least X die having the same number of ANY number" you may want to reword the original question as it is unclear. Since you start off talking about the # of combinations that have at least 5 3s it makes people think you are talking about the probability of x die having y value.
JonBWalsh
@JonBWalsh, good point I clarify in the question. Sorry for the confusion.
e5
+4  A: 

I suggest that you spend a little bit of time writing up a Monte Carlo simulation and let it run while you work out the math by hand. Hopefully the Monte Carlo simulation will converge before you're finished with the math and you'll be able to check your solution.

A slightly faster option might involve creating a SO clone for math questions.

David Locke
+1 I hope there are others who see the humor in your answer. :)
Robert Harvey
+1, funny, and so clone for math. The reason I trying is to double check the results a program I wrote which generating what monte carlo simulates. =)
e5
+2  A: 

This problem is really hard if you have to generalize it (get the exact formula).

But anyways, let me explain the algorithm. If you want to know

the number of ways to get exactly 5 dice of the same value

you have to rephrase your previous problem, as

calculate the number of ways to get exactly 5 dice of the value 3 AND no other value can be repeated exactly 5 times

For simplicity's sake, let's call function F(20,8,5) (5 dice, all values) the first answer, and F(20,8,5,3) (5 dice, value 3) the second. We have that F(20,8,5) = F(20,8,5,3) * 8 + (events when more than one value is repeated 5 times)

So if we can get F(20,8,5,3) it should be pretty simple isn't it? Well...not so much...

First, let us define some variables: X1,X2,X3...,Xi , where Xi=number of times we get the dice i

Then:

F(20,8,5)/20^8 = P(X1=5 or X2=5 or ... or X8=5, with R=20(rolls) and N=8(dice number))

, P(statement) being the standard way to write a probability.

we continue:

F(20,8,5,3)/20^8 = P(X3=5 and X1<>5 and ... and X8<>5, R=20, N=8) 
F(20,8,5,3)/20^8 = 1 - P(X1=5 or X2=5 or X4=5 or X5=5 or X6=5 or X7=5 or X8=5, R=15, N=7)  
F(20,8,5,3)/20^8 = 1 - F(15,7,5)/7^15

recursively:

F(15,8,5) = F(15,7,5,1) * 7  
P(X1=5 or X2=5 or X4=5 or X5=5 or X6=5 or X7=5 or X8=5, R=15, N=7) = P(X1=5 and X2<>5 and X4<>5 and .. and X8<>5. R=15, N=7) * 7


F(15,7,5,1)/7^15 = 1 - F(10,6,5)/6^10 F(10,6,5) = F(10,6,5,2) * 6


F(10,6,5,2)/6^10 = 1 - F(5,5,5)/5^5
F(5,5,5) = F(5,5,5,4) * 5

Well then... F(5,5,5,4) is the number of ways to get 5 dices of value 4 in 5 rolls, such as no other dice repeats 5 times. There is only 1 way, out of a total 5^5. The probability is then 1/5^5.

F(5,5,5) is the number of ways to get 5 dices of any value (out of 5 values) in 5 rolls. It's obviously 5. The probability is then 5/5^5 = 1/5^4.

F(10,6,5,2) is the number of ways to get 5 dices of value 2 in 10 rolls, such as no other dice repeats 5 times. F(10,6,5,2) = (1-F(5,5,5)/5^5) * 6^10 = (1-1/5^4) * 6^10

Well... I think it may be incorrect at some part, but anyway, you get the idea. I hope I could make the algorithm understandable.

edit: I did some checks, and I realized you have to add some cases when you get more than one value repeated exactly 5 times. Don't have time to solve that part thou...

Francisco
+1, for answering the first question. Any ideas on the second? What does the <> notation mean?
e5