views:

368

answers:

5

Will adding 6 random unique numbers in the range 0-32 and doing a modulus on the result favour a high number?

Example: 9 +10 +11 +18 +25 +28 +32 = 133 % 20 = 13

A: 

Counterexamples

9 +10 +11 +18 +25 +28 +32 = 133 % 2 = 1

9 +10 +11 +18 +25 +28 +32 = 133 % 200 = 133

Which perhaps suggests you could usefully clarify or sharpen your question.

Regards

Mark

High Performance Mark
Maybe the OP means "high" in relation to the modulus, i.e. sum % modulus > modulus / 2
0xA3
Maybe the OP does.
High Performance Mark
+2  A: 

The answer is: It depends. The following sample program will print the average values for various modulus values. Obviously it's not a mathematical proof but it should give you already a feeling how the average values behave:

using System;
using System.Collections.Generic;
using System.Linq;

class Program
{
    static Random rand;

    static void Main(string[] args)
    {
        rand = new Random();

        for (int modulus = 1; modulus < 1000; modulus++)
        {
            calculateAverage(modulus);
        }
    }

    public static void calculateAverage(int modulus)
    {
        List<int> moduloList = new List<int>(100);

        for (int i = 0; i < 100; i++)
        {
            int sum = 0;
            for (int k = 0; k < 6; k++)
            {
                sum += rand.Next(0, 33);
            }
            moduloList.Add(sum % modulus);
        }
        Console.WriteLine("Average for modulus {0}: {1}", modulus, moduloList.Average());
    }
}

Output generated:

Average for modulus 1: 0
Average for modulus 2: 0,49
Average for modulus 3: 1,03
Average for modulus 4: 1,47
Average for modulus 5: 1,96
Average for modulus 6: 2,55
Average for modulus 7: 3,03
Average for modulus 8: 3,42
Average for modulus 9: 4,15
Average for modulus 10: 5,06
Average for modulus 11: 4,62
Average for modulus 12: 5,9
Average for modulus 13: 5,82
Average for modulus 14: 6,8
Average for modulus 15: 7,28
Average for modulus 16: 7,8
Average for modulus 17: 8,15
Average for modulus 18: 9,34
Average for modulus 19: 9,2
Average for modulus 20: 10,36
Average for modulus 21: 9,74
Average for modulus 22: 9,41
Average for modulus 23: 11,5
Average for modulus 24: 11,51
Average for modulus 25: 11,45
Average for modulus 26: 13,05
Average for modulus 27: 12,59
Average for modulus 28: 14,92
Average for modulus 29: 13,1
Average for modulus 30: 14,1
Average for modulus 31: 15,5
Average for modulus 32: 16,46
Average for modulus 33: 16,54
Average for modulus 34: 16,38
Average for modulus 35: 19,61
Average for modulus 36: 17,26
Average for modulus 37: 15,96
Average for modulus 38: 19,44
Average for modulus 39: 17,07
Average for modulus 40: 17,73
0xA3
Nice. I'm only upvoting once because you didn't generate a 3D graph with the actual distribution for each possible choice of modulo :)
Pascal Cuoq
100 samples are not enough to get statistically relevant results. I.e. to show that the result is biased towards larger residues for the modulus 20 and range 0-32 one would need millions of samples.
Accipitridae
@Accipitridae: It's just a sample. Adjust the values as you like.
0xA3
Of course, you can change the number of samples. However, you don't know how many samples you need without first computing the bias. If you look at the answer by Sanjaya R, then you see that he used 1000000 samples and still got the wrong answer. I.e. the distribution of the numbers is biased, but the bias is to small to notice with 1000000 samples.
Accipitridae
+5  A: 

As an interesting aside, there is a powerful method which can be used to figure this out manually, or very fast (instead of using brute force) on a computer using the concept of Generating Functions.

(Warning: longish post)

You are working in the range 0 to 19, but getting that by generating numbers at random from 0-32.

If the chance of getting the number i is p(i) [Note, p(0) = p(1) = p(2) = ... = p(12) and p(13) = ..= p(19) and p(0) = 2p(13)).

Now we are interested in the chances of getting a particular sum by generating random numbers 6 times and adding them up.

This can be modelled by computing coefficients in the sixth power of the polynomial

P(x) = p(0) + p(1) * x + p(2) * x^2 + ... + p(r) * x^r + ... + p(19) * x^19

Thus we are looking at the coefficients of (P(x))^6.

For the given problem, we can ignore the 1/33 factor (in order to compare which sum is more likely) and have p(0) = 2, p(1) = 2, ..., p(19) =1.

Thus we are looking at P(x) = 2(1 + x + x^2 + ... + x^12) + x^13 + x^14 + .. + x^19.

We now just need to compute the coefficients of its sixth power, take the exponents modulo 20 and add them up. Fast polynomial multiplication algorithms like FFT can be used here.

In fact we could probably do it manually using some algebra with complex numbers and/or prove statements about the probability distribution with conviction.

Moron
You lost me at p(0)=2p(13). I thought the modulo was applied to the sum, not to the individual summands.
Pascal Cuoq
@Pascal: This is the case for just one number. If you take 0-32 and take mod 20, you get these probabilities. Note that (x+y) modulo 20 is same as ((x modulo 20) + (y modulo 20)) modulo 20.
Moron
Generating functions are indeed a nice tool to solve this kind of problems. The book "Concrete Mathematics" by Graham, Knuth and Patashnik contains a really good introduction into the subject.
Accipitridae
+1  A: 

Here is a small python program for computing the probability distribution

# modulus
m = 20
# range of the random numbers 0..n-1
n = 33
# number of random numbers in sum
k = 6

# distribution of one random number
# a[i] is the probability that a random number modulo m is i.
a = [0]*m
for i in range(n): a[i % m]+= 1/n

# convolution
b = a
for i in range(1,k):
    # Here b[t] is the probability that the sum of i random numbers is t.
    # Compute c[t] as the probability that the sum of i+1 random numbers is t.
    c = [0]*m
    for i in range(m):
        for j in range(m):
            c[(i+j)%m] += a[i]*b[j]
    b=c

# print the probability distribution of the result
for i in range(m): print(i, b[i])

# compute average
print("average", sum(i*b[i] for i in range(m)))

This gives the following result:

0 0.0500007971936
1 0.0499999764222
2 0.0499991633939
3 0.0499984370886
4 0.0499978679688
5 0.0499975063648
6 0.0499973824748
7 0.0499975063648
8 0.0499978679688
9 0.0499984370886
10 0.0499991633939
11 0.0499999764222
12 0.0500007971936
13 0.0500015451796
14 0.0500021452719
15 0.0500025347512
16 0.0500026702559
17 0.0500025347512
18 0.0500021452719
19 0.0500015451796
average 9.50015120662

I.e. the high numbers are indeed a little more probable, but the differences are very small.

Accipitridae
A: 

No. Its even, or at least the skew doesn't seem to be more than 0.05 %.

Even though the range of possible numbers does not evenly map to the mod ( 192 % 20 = 12 ), the range of distribution is much greater than the mod, so it works it self out. Here's my run of 1,000,000.

MOD COUNT %
0 50098 5.00980
1 49660 4.96600
2 49832 4.98320
3 50150 5.01500
4 50276 5.02760
5 49864 4.98640
6 50282 5.02820
7 49771 4.97710
8 49886 4.98860
9 49663 4.96630
10 49499 4.94990
11 49964 4.99640
12 50155 5.01550
13 50169 5.01690
14 49829 4.98290
15 50191 5.01910
16 49887 4.98870
17 50334 5.03340
18 50139 5.01390
19 50351 5.03510
Sanjaya R
.0005 still doesn't mean it's a good idea.
Joey
It's not even after a certain threshold.
0xA3