views:

4327

answers:

14

I need to generate random numbers in the range 1 - 10000 continuously with out duplication. Any recommendations?

Description: we are building a new version for our application, which maintains records in Sqlite DB. in the last version of our application, we did not had unique key for each record. But now with new upgraded version, we need to support the import facility from the last version's DB. So what we do is, we read each and every record from the old DB and generate a random number for the unique key and store it in new DB. Here we many need to import up to 10000 records continuously.

+5  A: 

Well, eventually you'll either have to stop generating them, or you're going to star duplicating them.

On a computer your options are pretty limited to Pseudo Random Number Generators (PRNGs), and given your constraint that they never repeat then a PRNG is your best option - real random data will occasionally duplicate a number.

In your case, I'd consider using a large PRNG (32 bit or larger) to shuffle your 10,000 numbers, and then send the numbers out in the shuffled order.

Once they're used up you can shuffle again - since the PRNG is so large you'll be able to go through the 10k numbers many times before duplicating a sequence.

Give us more information about what your doing and we may come up with a better answer.

Adam Davis
+5  A: 

Mersenne Twister is the current best (though i could be a few weeks behind any really new discoveries). Source in just about every language is available somewhere out there, and MT is also provided in Boost here

DarenW
Mersenne Twister is considered a good compromise between Fast and Perfect PRNG, as far as I know.
Paul Nathan
It's only the 'best' for certain application, i.e. everything not crypto (like the OP's use case, or simulations).
Roel
+2  A: 

Boost.Random is a good choice and works fine for me. However, if you don't need many random number generators and distributions you may look for another library just not to install the whole Boost package.

Turker
+2  A: 

How random? Obviously there's rand(), there's also OS specific stuff (Windows has something in the CryptoAPI, for example). Are you writing something (not recommended), or just looking for a pre-existing function to use?

Nick
+3  A: 

TR1 has good random number support - if your compiler supports it.

Otherwise Boost

It's basically what became TR1.

As far as not getting duplicates - you want a shuffle. It can be pretty simple, but there are some pitfalls if you don't do it right. Jeff Atwood did a nice write up a while back:

http://www.codinghorror.com/blog/archives/001015.html

Michael Burr
+2  A: 

mtrand is nice.

Mr Fooz
+3  A: 

Boost probably does something that garantees no repeated numbers. But for a bit of fun here is my idea.

Note: I don't try and generate my rand in that direction lies madness.

#include <iostream>
#include <vector>
#include <algorithm>


class GaranteedNoRepeatRandom
{
    public:
        GaranteedNoRepeatRandom(int limit)
            :data(limit)
            ,index(0)
        {
            for(int loop=0;loop < limit;++loop)
            {   data[loop]  = loop;
            }
            // Note: random_shuffle optionally takes a third parameter
            // as the rand number generator.
            std::random_shuffle(&data[0],&data[0]+limit);
        }

        unsigned int rand()
        {
            unsigned int result = data[index];
            index   = (index+1) % data.size();

            // Add code to re-shuffle after index wraps around
            return result;
        }
    private:
        std::vector<unsigned int>               data;
        std::vector<unsigned int>::size_type    index;
};

int main()
{
    GaranteedNoRepeatRandom     gen(10000);

    for(int loop =0;loop < 10;++loop)
    {
        std::cout << gen.rand() << "\n";
    }
}
Martin York
A: 

Numerical Recipes in C has a whole chapter dedicated to random number generation. There are a few implementation there. From simple and straight forward to complex with good statistical properties.

shoosh
-1 for linking to torrent sites with pirated content
HMage
+2  A: 

Is it ok to question the whole idea of using a random number as the unique key for database record? I'm not familiar with sqlite, but it's worth investigating whether it supports some kind of unique column identifier internally. SQL Server has 'identity' columns, for example, and Oracle has 'sequences', both of which serve the same purpose.

Andrew
+2  A: 

Generate large random numbers. Say 128 bits. The odds of two such numbers being the same in a set of 10000 are ridiculously small (on the order of n^2/2^b, where n = number of numbers needed and b = number of bits used). Given enough bits, the odds will become smaller than the odds of your ram being corrupted by a cosmic ray such that your algorithm fails anyway. Be careful that the space you are drawing the random numbers from really has the number of bits you are looking for. It is easy to mistakenly generate 128 bit numbers from a pool of 32 bits (i.e., there are only 2^32 possibilities even though you are generating numbers 1 through 2^128). The random number generators in the boost library can do this correctly for you. BTW: if you don't like 128 bits, then use 256 bits or more until you are comfortable that there is no practical chance of a hashing collision. If you only have to do this once, then just use the shuffle method already mentioned in a previous answer. That will have the advantage of generating a perfect hash.

ejgottl
A: 

http://random.org/ if you need truly random numbers

Greg Dean
+2  A: 

While you may have a requirement to generate a sequence of values that do not repeat, you cannot call the result "random". True randomness has less to do with lack of repetition than it has to do with the distribution of values in a sequence.

Jon Trauntvein
+3  A: 

If it really must be in the range of 1 to 10,0000 without repeats, but non-sequential then it would probably be best to first create a sequential array of 10000 elements, and then shuffle them.

However, I must agree with the comments on the original question. I see no value in making them non- sequential.

Alternately, in unique & non-sequential are important, then the 1 to 10,000 range become questionable. It would probably be best to just used a GUID.

James Curran
+2  A: 

The generation of random numbers is too important to be left to chance. -- Robert R. Coveyou, Oak Ridge National Laboratory

Ronald Conco