views:

56

answers:

3

I have a problem where I need to be able to generate identical uniformly distributed numeric hashs for a GUID in both javascript and C#. I guess that would prevent me from using Guid.GetHashCode() in C#, since I can't reproduce the behavior in JS without reverse engineering the C#.

Is there a fast way to produce hashes from guids/strings in JS? Are all digits of the string uniformly distributed in a .NET generated GUID? Should I just cast/convert the trailing chars into an int?

+1  A: 

you can create a web service to generate the hash value on the server side, use whatever language you want. on client side, a simple web service call will do the trick.

Russel Yang
I was after something fast - I need to avoid a round-trip to the server, or between server machines. It's to allow a kind of load balancing, that doesn't require hardware and that will allow my Ajax app to fail over gracefully without having to wait.
Andrew Matthews
what's the purpose of calculating the guid hash? Do you want to use it for load balancing?
Russel Yang
I need it to be able to identify (from anywhere in the server side or client side) a specific subset of the IIS servers to pass cache data to in readiness for an incoming request from the UI.
Andrew Matthews
+1  A: 

Reflector says the .NET Guid.GetHashCode() is implemented like this

public override int GetHashCode()
{
    return ((this._a ^ ((this._b << 0x10) | ((ushort) this._c))) ^ ((this._f << 0x18) | this._k));
}

_a, _b, _c and _f is defined in the constructor taking a byte[16] array

public Guid(byte[] b)
{
    if (b == null)
    {
        throw new ArgumentNullException("b");
    }
    if (b.Length != 0x10)
    {
        throw new ArgumentException(Environment.GetResourceString("Arg_GuidArrayCtor", new object[] { "16" }));
    }
    this._a = (((b[3] << 0x18) | (b[2] << 0x10)) | (b[1] << 8)) | b[0];
    this._b = (short) ((b[5] << 8) | b[4]);
    this._c = (short) ((b[7] << 8) | b[6]);
    this._d = b[8];
    this._e = b[9];
    this._f = b[10];
    this._g = b[11];
    this._h = b[12];
    this._i = b[13];
    this._j = b[14];
    this._k = b[15];
}
Albin Sunnanbo
Is there a non-string based representation for GUIDs in JS? I suspect that conversion of the string (that I'd be receiving) into a byte array, and from there into a GUID in order to generate the int would be wasteful if there was a more direct way to create a good hash from a string representation.
Andrew Matthews
To just create an index for load balancing this seems to be overkill. The conversion to the bytes is done simply by grouping the letters 2 by 2 and parse them as a hex value. I have done some tests with the distribution if the digits and it looks like some bytes are more common than other, but I have to look into more detail about where the differences are before I can post anything about that.
Albin Sunnanbo
+1  A: 

The bytes are apparently not evenly distributed.

I put together some code to sample the .NET Guids and plot the distribution:

First of all the test code, this creates one million Guids and counts the number of different values for each byte in the byte array. It outputs it all into a matrix that I plot in Scilab.

int[,] counter = new int[16, 256];
for (int i = 0; i < 1000000; i++)
{
    var g = Guid.NewGuid();
    var bytes = g.ToByteArray();
    for (int idx = 0; idx < 16; idx++)
    {
        counter[idx, bytes[idx]]++;
    }
}
StringBuilder sb = new StringBuilder();
sb.AppendLine("x = [");
for (int idx = 0; idx < 16; idx++)
{
    for (int b = 0; b < 256; b++)
    {
        sb.Append(counter[idx, b]);
        if (idx != 255)
        {
            sb.Append(" ");
        }
    }
    if (idx != 15)
    {
        sb.AppendLine(";");
    }
}
sb.AppendLine("]");

File.WriteAllText("plot.sce", sb.ToString());

Here are the distributions, the graphs plot the number of each distinct value for each of the positions in the byte array:

The value distribution for the positions 0-6 in the byte array: The value distribution for the positions 0-6 in the byte array
The value distribution for the position 7 in the byte array:
The value distribution for the position 7 in the byte array
The value distribution for the position 8 in the byte array:
The value distribution for the position 8 in the byte array
The value distribution for the positions 9-15 in the byte array: The value distribution for the positions 9-15 in the byte array

For byte positions 0-6 and 9-15 the distribution of values seems to be even, but for byte position 7 and 8 the distribution is fairly limited.

That is, for the guid (with the beginning of the byte positions below, note strange ordering)

{1369ea05-b9f9-408b-ac7c-7ebd0f35d562}
                         1 1 1 1 1 1
 3 2 1 0  5 4  7 6  8 9  0 1 2 3 4 5

The position 7 can take the values from 64 (0x40) to 79 (0x4F).
The position 8 can take the values from 128 (0x80) to 191 (0xBF).
The rest of the bytes are evenly distributed.

Note: The tests was run on .NET4 on a 32 bit Windows 7 machine.

Lesson: don't assume stuff, test.

Answer: To use the .NET Guids for calculating your load balancing, you can use any part except the positions marked 7 and 8 in the Guid above.

Question: Does anybody know WHY the distribution is not evenly spread?

Albin Sunnanbo
Andrew Matthews
Ah, some versioning information and a markers for some algorithmic choices.
Albin Sunnanbo