tags:

views:

81

answers:

2

I need to generate a resource domain prefix based on a given path and configured number of resource domains in a deterministic fashion with good distribution. For example, if you pass it the path "/script/site.js" it returns "res#", where '#' is an integer between 0 and the configured amount of resource domains.

Using C# 3.0.

So far have this:

var resourceDomainCount = 4;
var hasher = System.Security.Cryptography.SHA1.Create();
var paths = new [] {
 "/App_Themes.0.1.433.232/images/buttonsBackgrounds.jpg",
 "/App_Themes.0.1.433.232/images/blahblah.jpg",
 "/App_Themes.0.1.433.232/images/pagebg.gif",
 "/App_Themes.0.1.433.232/site.css",
 "/script/site.js",
 "/App_Themes.0.1.433.232/images/different.jpg",
 "/App_Themes.0.1.433.232/images/shadows.jpg",
 "/Handlers/ImageHandler.ashx?type=l&id=123&s=g&index=0",
 "/Handlers/ImageHandler.ashx?type=l&id=234&s=g&index=0",
 "/Handlers/ImageHandler.ashx?type=l&id=345&s=g&index=0",
 "/Handlers/ImageHandler.ashx?type=p&id=MyProduct&s=g&index=0",
 "/Handlers/ImageHandler.ashx?type=p&id=WineGreat&s=g&index=0",
 "/Handlers/ImageHandler.ashx?type=p&id=YayYay&s=g&index=0"
 };
foreach (var path in paths) {
 var pathHash = hasher.ComputeHash(Encoding.ASCII.GetBytes(path));
 var singleByteHash = pathHash.Aggregate(0, (a, b) => a ^ b);
 var random = new Random((int)singleByteHash);
 var resourceDomainIndex = random.Next(0, resourceDomainCount);
 (resourceDomainIndex + ": " + path).Dump();
}

Which produces the following:

3: /App_Themes.0.1.433.232/images/buttonsBackgrounds.jpg
0: /App_Themes.0.1.433.232/images/blahblah.jpg
1: /App_Themes.0.1.433.232/images/pagebg.gif
1: /App_Themes.0.1.433.232/site.css
3: /script/site.js
1: /App_Themes.0.1.433.232/images/different.jpg
3: /App_Themes.0.1.433.232/images/shadows.jpg
1: /Handlers/ImageHandler.ashx?type=l&id=123&s=g&index=0
1: /Handlers/ImageHandler.ashx?type=l&id=234&s=g&index=0
0: /Handlers/ImageHandler.ashx?type=l&id=345&s=g&index=0
2: /Handlers/ImageHandler.ashx?type=p&id=MyProduct&s=g&index=0
1: /Handlers/ImageHandler.ashx?type=p&id=WineGreat&s=g&index=0
0: /Handlers/ImageHandler.ashx?type=p&id=YayYay&s=g&index=0

Not getting the distribution I want (there is only one instance of '2' there).

There are thousands of input strings that change all the time and the paths will be generated at runtime, e.g.: <a href="<%= GetPath("/script/site.js") %>">Link</a>

A: 

Your sample is far, far too small. Randomness is generally like statistics. It's only borne out over a long period of time. Try rerunning your code over a 100 iterations, and see if each value comes up about the same amount of time. Then try again for 10,000 iteratons and notice that the actual number of results are closer percentage-wise. Then run it for 1 million iterations, and notice that it's even closer. It never needs to be exactly the same - otherwise it wouldn't really be random.

And if the results don't follow that pattern, then maybe you've got a problem with teh even distribution issue.

atk
I understand what you're saying, but I'd like to find an algorithm that provides good distribution even over small sets. Imaging a page that shows 25 search results, each with their own image. I'd like good distribution between 4 resource domains for those 25 image paths.
Damian Edwards
atk
+3  A: 

This is a lot simpler than hashing the string (well, presumably it's already a hash anyway) and seems to get a better distribution:

Math.Abs(path.GetHashCode()) % resourceDomainCount
Matt Hamilton
This works well, even for a small set. Cheers.
Damian Edwards
Btw, substituting that into the original snippet gets this result:0,2,1,2,3,1,3,0,1,2,1,2,2
Damian Edwards