tags:

views:

511

answers:

8

I would like to generate a long UUID - something like the session key used by gmail. It should be at least 256 chars and no more than 512. It can contain all alpha-numeric chars and a few special chars (the ones below the function keys on the keyboard). Has this been done already or is there a sample out there?

C++ or C#

Update: A GUID is not enough. We already have been seeing collisions and need to remedy this ASAP. 512 is the max as of now because it will prevent us from changing stuff that was already shipped.

Update 2: For the guys who are insisting about how unique the GUID is, if someone wants to guess your next session ID, they don't have to compute the combinations for the next 1 trillion years. All they have to do is use constrain the time factor and they will be done in hours.

+9  A: 
StringBuilder sb = new StringBuilder();
for (int i = 0; i < HOW_MUCH_YOU_WANT / 32; i++)
   sb.Append(Guid.NewGuid().ToString("N"));
return sb.ToString();

but what for?

Andrey
+1 for asking the right question.
Steven Sudit
+1  A: 

There are two really easy ways (C#):

1) Generate a bunch of Guids using Guid.NewGuid().ToString("N"). each GUID will be 32 characters long, so just generate 8 of them and concatenate them to get 256 chars.

2) Create a constant string (const string sChars = "abcdef") of acceptable characters you'd like in your UID. Then in a loop, randomly pick characters from that string by randomly generating a number from 0 to the length of the string of acceptable characters (sChars), and concatenate them in a new string (use stringbuilder to make it more performant, but string will work too).

Jeremy
The latter technique is not guaranteed unique, since it's random. It's UNLIKELY to clash, but not guaranteed.
Will Hartung
Of course, the first technique is also not guaranteed unique and more likely to clash than the second one. :)
Stephen Cleary
+9  A: 

As per your update2 you are correct on Guids are predicable even the msdn references that. here is a method that uses a crptographicly strong random number generator to create the ID.

static long counter; //store and load the counter from persistent storage every time the program loads or closes.

public static string CreateRandomString(int length)
{
    long count = System.Threading.Interlocked.Increment(ref counter);
    int PasswordLength = length;
    String _allowedChars = "abcdefghijkmnopqrstuvwxyzABCDEFGHJKLMNOPQRSTUVWXYZ23456789";
    Byte[] randomBytes = new Byte[PasswordLength];
    RNGCryptoServiceProvider rng = new RNGCryptoServiceProvider();
    rng.GetBytes(randomBytes);
    char[] chars = new char[PasswordLength];
    int allowedCharCount = _allowedChars.Length;
    for (int i = 0; i < PasswordLength; i++)
    {
        while(randomBytes[i] > byte.MaxValue - (byte.MaxValue % allowedCharCount))
        {
            byte[] tmp = new byte[1];
            rng.GetBytes(tmp);
            randomBytes[i] = tmp[0];
        }
        chars[i] = _allowedChars[(int)randomBytes[i] % allowedCharCount];
    }
    byte[] buf = new byte[8];
    buf[0] = (byte) count;
    buf[1] = (byte) (count >> 8);
    buf[2] = (byte) (count >> 16);
    buf[3] = (byte) (count >> 24);
    buf[4] = (byte) (count >> 32);
    buf[5] = (byte) (count >> 40);
    buf[6] = (byte) (count >> 48);
    buf[7] = (byte) (count >> 56);
    return Convert.ToBase64String(buf) + new string(chars);
}

EDIT I know there is some biasing because allowedCharCount is not evenly divisible by 255, you can get rid of the bias throwing away and getting a new random number if it lands in the no-mans-land of the remainder.

EDIT2 - This is not guaranteed to be unique, you could hold a static 64 bit(or higher if necessary) monotonic counter encode it to base46 and have that be the first 4-5 characters of the id.

UPDATE - Now guaranteed to be unique

UPDATE 2: Algorithm is now slower but removed biasing.

EDIT: I just ran a test, I wanted to let you know that ToBase64String can return non alphnumeric charaters (like 1 encodes to "AQAAAAAAAAA=") just so you are aware.

New Version:

Taking from Matt Dotson's idea from the bottom of this page, if you are no so worried about the keyspace you can do it this way and it will run a LOT faster.

public static string CreateRandomString(int length)
{
    length -= 12; //12 digits are the counter
    if (length <= 0)
        throw new ArgumentOutOfRangeException("length");
    long count = System.Threading.Interlocked.Increment(ref counter);
    int PasswordLength = length;
    Byte[] randomBytes = new Byte[length * 3 / 4];
    RNGCryptoServiceProvider rng = new RNGCryptoServiceProvider();
    rng.GetBytes(randomBytes);

    byte[] buf = new byte[8];
    buf[0] = (byte)count;
    buf[1] = (byte)(count >> 8);
    buf[2] = (byte)(count >> 16);
    buf[3] = (byte)(count >> 24);
    buf[4] = (byte)(count >> 32);
    buf[5] = (byte)(count >> 40);
    buf[6] = (byte)(count >> 48);
    buf[7] = (byte)(count >> 56);
    return Convert.ToBase64String(buf) + Convert.ToBase64String(randomBytes);
}
Scott Chamberlain
Technically you're still not guaranteed to be unique. When you run out of counter bits and start cycling, you'll get the very remote possibility of collision. If you're going to keep a counter, the you can just use System.Numerics.BigInteger and count forever.
Matt Dotson
if he generates 11,574 new id's per second (1 billion per day) 24 hours a day, 356 days a year, it will take 50.5 million years for it to wrap around. I think 64 bit will be fine.
Scott Chamberlain
A: 

I would use some kind of hash of std::time() probably sha512. ex (using crypto++ for the sha hash + base64 encoding).

#include <iostream>
#include <sstream>
#include <ctime>
#include <crypto++/sha.h>
#include <crypto++/base64.h>

int main() {
    std::string digest;
    std::stringstream ss("");
    ss << std::time(NULL);

    // borrowed from http://www.cryptopp.com/fom-serve/cache/50.html
    CryptoPP::SHA512 hash;
    CryptoPP::StringSource foo(ss.str(), true,
        new CryptoPP::HashFilter(hash,
           new CryptoPP::Base64Encoder(
               new CryptoPP::StringSink(digest))));
    std::cout << digest << std::endl;

    return 0;
}
RC
I am thing along the same lines.
Ron
+19  A: 

If your GUIDs are colliding, may I ask how you're generating them?

It is astronomically improbable that GUIDs would collide as they are based on:

  • 60 bits - timestamp during generation
  • 48 bits - computer identifier
  • 14 bits - unique ID
  • 6 bits are fixed

You would have to run the GUID generation on the same machine about 50 times in the exact same instant in time in order to have a 50% chance of collision. Note that instant is measured down to nanoseconds.

Update:

As per your comment "putting GUIDs into a hashtable"... the GetHashCode() method is what is causing the collision, not the GUIDs:

public override int GetHashCode()
{
    return ((this._a ^ ((this._b << 0x10) | ((ushort) this._c))) ^ ((this._f << 0x18) | this._k));
}

You can see it returns an int, so if you have more than 2^32 "GUIDs" in the hashtable, you are 100% going to have a collision.

John Rasch
Try inserting in sql-server 2005. That's were we saw the collisions
Ron
@Ron: you could always have SQL give you the GUID.
Matthew Whited
This is a good point. The OP should modify his test program to check for equality when he gets a collision. That way you can tell the difference between 2 GUIDs with colliding hash codes and 2 identical GUIDs.
A. Levy
+1 for brilliant insight on the hash table.
Ogre Psalm33
+1  A: 
byte[] random = new Byte[384];

//RNGCryptoServiceProvider is an implementation of a random number generator.
RNGCryptoServiceProvider rng = new RNGCryptoServiceProvider();
rng.GetBytes(random);
var sessionId = Convert.ToBase64String(random);

You can replace the "/" and "=" from the base64 encoding to be whatever special characters are acceptable to you.

Base64 encoding creates a string that is 4/3 larger than the byte array (hence the 384 bytes should give you 512 characters).

This should give you orders of magnatude more values than a base16 (hex) encoded guid. 512^16 vs 512^64

Also if you are putting these in sql server, make sure to turn OFF case insensitivity.

Matt Dotson
+4  A: 

The problem here is why, not how. A session ID bigger than a GUID is useless, because it's already big enough to thwart brute force attacks.

If you're concerned about predicting GUID's, don't be. Unlike the earlier, sequential GUID's, V4 GUID's are cryptographically secure, based on RC4. The only exploit I know about depends on having full access to the internal state of the process that's generating the values, so it can't get you anywhere if all you have is a partial sequence of GUID's.

If you're paranoid, generate a GUID, hash it with something like SHA-1, and use that value. However, this is a waste of time. If you're concerned about session hijacking, you should be looking at SSL, not this.

Steven Sudit
For the record, this was downvoted arbitrarily by someone who is unhappy with me.
Steven Sudit
+1  A: 

You may want to check out boost's Uuid Library. It supports a variety of generators, including a random generator that might suit your needs.

Alan