views:

75

answers:

2

I use a random function(Let's call it randomNum()) to generate random numbers(unsigned long) continuously(will generate about one million numbers in total).My question is How to determine whether the frequency of current number generated is greater than 20%(among the total numbers generated so far) efficiently?

Please write down your opinion or your c code?Thanks.

+2  A: 

A hash-table, whose entries are the count for each value?

Oli Charlesworth
The problem is that 1 million is muuch smaller than the range of unsigned long, so if the `randomNum()` is OK (have homogenous distribution), there will be nothing to count actually.
ruslik
The OP means a hash with one million elements; you can only have a maximum of one million different numbers.
Blank Xavier
I know, and gave him +1. The commentary was for schemacs.
ruslik
My classmate first thought about balanced-search-tree.But the requirement for memory(hash-table and balanced-search-tree) is huge,is this right?
schemacs
@schemacs: The storage requirement will be proportional to the number of different values that have been drawn. You can't really get around the fact that potentially (and extremely likely in practice), each value drawn could be different, so you need to set up a counter for each one.
Oli Charlesworth
Yes,each value draw may be different.Do you mean that I have to set up counters for each number generated?The memory usage will skyrocket on massive random numbers.
schemacs
@schemacs: Of course! But 1 million is a lot less than 2^32.
Oli Charlesworth
+1  A: 

If I understand your question you are asking:

If I draw 10^6 samples from an RNG which can produce any integer in the range 0..(2^32)-1 what is the probability that 0.2 x 10^6 of the samples will have the same value ?

Unless your RNG is seriously flawed the answer is 0 probability, to more decimal places than you ought to worry about in any realistic set of circumstances.

So, obviously, I have misunderstood the question ...

High Performance Mark
Oh,You misunderstood the question.Focus on DYNAMIC.That means you have TO determinate each number's frequency you just draw from the RNG(the total numbers are the numbers you have draw in total till now).
schemacs
@schemacs: In the situation you describe, the "frequency" of each value will be 0 or 1, with high probability.
Oli Charlesworth
You mean the frequency is 0 or 1 for each value?The frequency is range in [0,1] here,which equals to n(cur_val)/n(total_till_now). n(cur_val) is the number of appearance time for current value,and n(total_till_now) is the total number.
schemacs