views:

452

answers:

4

I've got a project where we'll need to generate a lot of fixed-length random codes (Read: Millions) from a set of characters (EG: 12 digit alpha-numeric or 9 digit alpha-numeric lower-case only without the l character). We're going to then store these codes in an MSSQL database (SQL Server 2008). The language we're using is C#.

We also need to be able to generate more codes and add them to an existing set of codes with them being unique against themselves and the existing codes. The quantity of random codes generated will likely vary from millions down to merely hundreds.

The two obvious approaches that come to mind is either to generate codes and just throw them at the database catching unique constraint exceptions or to pull the data down locally into a hash table then calculate all the new codes locally and put them into the database once generated.

Does anyone have any idea which of the above solutions would be more optimal or even better another solution that's more efficient that I haven't thought of?

Clarifications

The codes generated have to be non-predictable and there'll be multiple batches, each with uniqueness within themselves (EG: We'd have code set A with 100000 unique codes, code set B with 100000 unique codes, but there'd be no restriction that A intersect B is empty). They also have to be easy for a human to use (Hence the short length and potentially restricted character sets to avoid ambiguous characters).

The codes will be sent to users via various methods (Email, SMS, printed on paper, etc) and are used in a 1-use manner later (So if someone guesses someone else's code it'd be bad).

+3  A: 

It really depends on the specific problem requirements. Do the codes have to be merely unique or also unpredictable? If they just have to be unique, then you can use a linear congruential random number generator to create your codes.

Wikipedia Page on Linear Congruential Generators

Here's some sample code:

class CodeGenerator
{
    public long Seed
    {
        get { return _value; }
        set { _value = value; }
    }

    private char[] alphabet =
        {
            '0', '1', '2', '3', '4', '5', '6', '7', '8', '9',
            'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j',
            'k', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u',
            'v', 'w'
        };

    public String GetCode()
    {
        // Generate the next value in the psuedo-random sequence.
        _value = (362881L * _value + 76552897L) & 0xFFFFFFFFFFFL;

        // Create the code.  Add 2^44 to avoid small codes.
        long code = _value + (1L << 44);

        StringBuilder builder = new StringBuilder("123456789");

        // The codes are all less than 2^45, so we have 45 bits of
        // information and need 9 digits.
        for (int i = 8; i >= 0; i--)
        {
            builder[i] = alphabet[code & 0x1F];
            code = code >> 5;
        }

        return builder.ToString();
    }

    private long _value = 0;
}

The class will generate a sequence of 2^44 codes before repeating (over 17 trillion codes). To resume the sequence, simply record the current Seed value and restore it when you need more codes.

Peter Ruderman
Hey, good answer but unfortunately I wasn't clear enough - they do need to be non-predictable unfortunately.
Tim Schneider
Doh! Oh well, you could always encrypt the output. :)
Peter Ruderman
Then I lose the nice human-readable aspect, unless I shrink down the hash I create from it but at that stage I'll re-introduce the chances of collisions ;)
Tim Schneider
+3  A: 

Have you considered using GUIDs (uniqueidentifiers in SQL Server)? They are unique and mostly random. You can generate them on either the client-side or on the server.

You might also think about using a CLR function on the SQL side, to help minimize the number of DB round-trips.

To ensure uniqueness, one approach is to append a unique, non-random number (such as the value of an identity column) to your random numbers. The result isn't random at the bit-by-bit level, but it is random when taken as a whole.

Generating millions of unique random numbers won't take long. Inserting them into the DB will be the slow part....

RickNZ
GUID's aren't an option as they will have to conform to the requirements on the contents of the string (Including specific lengths and excluded characters). The reason being they're intended to be typed in by users which means a GUID might be a little cruel.
Tim Schneider
In that case, the rest of my answer still holds: to create unique, non-guessable random numbers, just create a "regular" random number, and then embed a unique number (such as a counter) in it somewhere, and convert the binary form to a string using whatever character encoding you're comfortable with. The individual random numbers can be generated by any number of means, such as the Random.Next() or RNGCryptoServiceProvider.GetBytes().
RickNZ
Interesting idea... Would probably have to do a bit of bitwise operations to obfuscate the counter a little, but that's not so bad. Would have to ensure the counter was a large enough value to allow for future expansion (Given we don't necessarily know how many codes we'll need at the start) which could have a bit of an impact on the randomness of shorter phrases... might be a matter of coming up with a limit formula that maintains randomness. I'll have to have a bit of a think about this one... but definitely given me food for thought
Tim Schneider
A: 

Generate all of them? In your first case you have a total of 35 characters per a position. Total storage is (base^positions) - 1 so your total number of combinations on the low end is 36^9 - 1 or 101,559,956,668,415 which is nearly a TB if the codes are one byte long...which they are not. And that's at the low end.

The best system is to pre-generate batches of valid numbers and feed these into the insertions. If the method of generation is semi-random then You can do this easily by dividing the random spaces using segments of the bit array. But you don't mention how random is random.

Of course if you have complete control over the randomness then you could just use UUIDs, which is what we do.

nick
When I say "all" I'm referring to "all one million codes" (Or whatever size the batch is), not "all possible codes". I'm more concerned with the implications of the "calculate in ram" approach when dealing with adding codes to a set that already has a few million (Obviously if I was trying to add 1 code it'd be faster to generate random codes and just check if they're in SQL for each code than to pull the list down in advance, but I'm not sure how the comparison would go if I wanted to generate another million)
Tim Schneider
A: 

For generating highly unpredictable random values, may I suggest using the System.Security.Cryptography.RNGCryptoServiceProvider class.

Sample code for generating abitrary length strings of random characters from a pre-defined set shown below. This has been used in a password generator.

private string GetRandomAlphanumericCharacters(int length)
{
    // Note: i, o, l, 0, and 1 have been removed to reduce 
    // chances of user typos and mis-communication of passwords.
    char[] allowedCharacters = { 'a', 'A', 'b', 'B', 'c', 'C', 'd', 'D', 'e', 'E', 'f', 'F', 'g', 'G', 'h', 'H', /*'i', 'I',*/ 'j', 'J', 'k', 'K', /*'l', 'L',*/ 'm', 'M', 'n', 'N', /*'o', 'O',*/ 'p', 'P', 'q', 'Q', 'r', 'R', 's', 'S', 't', 'T', 'u', 'U', 'v', 'V', 'w', 'W', 'x', 'X', 'y', 'Y', 'z', 'Z', /*'0', '1',*/ '2', '3', '4', '5', '6', '7', '8', '9' };

    // Create a byte array to hold the random bytes.
    byte[] randomNumber = new byte[length];

    // Create a new instance of the RNGCryptoServiceProvider.
    RNGCryptoServiceProvider Gen = new RNGCryptoServiceProvider();

    // Fill the array with a random value.
    Gen.GetBytes(randomNumber);

    string result = "";

    foreach (byte b in randomNumber)
    {
        // Convert the byte to an integer value to make the modulus operation easier.
        int rand = Convert.ToInt32(b);

        // Return the random number mod'ed.
        // This yeilds a possible value for each character in the allowable range.
        int value = rand % allowedCharacters.Length;

        char thisChar = allowedCharacters[value];

        result += thisChar;
    }

    return result;
}
saille
Thanks, but the key part was Unique.
Tim Schneider
Here's an alternate solution: generate some random salt bytes, append/prepend your unique value to the byte array, then encrypt. The result will be a random, unique byte array.
saille