tags:

views:

44

answers:

2

So, in order to speed up load times, we're setting up a bunch of CDN hostnames to serve images and assets from. What's the best way to consistently use the same host for the same asset? E.g. button.gif always gets served from http://assets-15.ourserver.com.

I was thinking of coming up with some rule, where the md5 hash of the filename somehow maps to a server (can't use the filename itself, since a lot are similar: "button-home.gif", "button-about.gif", etc.). I'm not sure if this is the most efficient way, but it seems like it would work.

Anyone have any experience with this sort of thing? I need a language-agnostic solution, because this will be used by several different languages.

EDIT: Yahoo's explanation on how this speeds things up: http://developer.yahoo.com/performance/rules.html#split

+1  A: 

The point of a Content Delivery Network (CDN) is to load the asset form the server closest to the user.

Loading a given asset from an explicit server defeats the purpose of the CDN. My guess is that it's not supported. If you need to load an asset from an explicit location, don't put it on the CDN, put it on a central server.

Justin Niessner
Maybe I didn't explain myself well enough--they will all be served from the CDN, but we have different asset hostnames pointing at the same origin. We're trying to split the assets between the hostnames to speed things up: http://developer.yahoo.com/performance/rules.html#split
Kyle Slattery
That's fine, but when using a CDN you still shouldn't specify which server assets are coming from. You should rely on the CDN to properly route your request to maximize speed both geographically and by load balancing across multiple servers (using different domain names).
Justin Niessner
This isn't to specify which origin server they're coming from. Say I host my assets at asset-origin.ourserver.com and create 2 CDN hostnames, assets-1.ourserver.com and assets-2.ourserver.com. The CDN endpoints both point to asset-origin.ourserver.com, but I want to distribute requests between the hostnames for better loading times in the browser. So, button.gif should always be loaded from assets-1.ourserver.com and logo.gif is always loaded from assets-2.ourserver.com.
Kyle Slattery
If that's the case, then why wouldn't you just hard code the domain name for the asset in your markup?
Justin Niessner
We need some sort of ruleset to know which server to grab it from, and because the setup will be different depending on what environment (e.g. staging and local dev won't use multiple hostnames), we need a way to add it programmatically. Also, this will need to be used across several different applications, so manually setting the asset hostname in each would be cumbersome.
Kyle Slattery
+1  A: 

When I did something like this, all the relevant resources had id numbers anyway, so I just used that as the basis. Still, it's not too hard to extend to non-numbers.

There's a balance in how many hostnames you use, with too many the host-lookup overhead outplays the advantage of multiple hostnames, so at the outside you'll likely have about 12, probably less.

This in itself means that a simple hash will likely split across the given range easily enough without any need to be particularly clever.

There's a lack of encoding issues confusion, because either your application deals with IURIs fully (in which case utf-8 handling is already an issue you've dealt with) or it doesn't, in which case every character in the URI-escaped form of the path (that is to say, the name used in the actual URI) is going to be in the ASCII range.

There's no need to by cryptographically secure or anything like that, as it isn't a security risk to guess the server used. It won't be the end of the world if one or two pages lean slightly to one server over another (randomness would have that happen with a perfect has anyway).

Hence just running through the characters in the absolute path of the URI for the image (everything after the host from the first / onwards) adding them their integer value to each other and then do use the modulo of that part of the hostname.

If you want to limit the number of characters processed for speed issues, then do it from the end backwards, as that will have the greatest variation.

That "button-home.gif" is similar to "button-about.gif" isn't an issue, as they aren't really very similar at all as seen through the eyes of a process like this.

If you ever increase the number of hostnames used, try to do it as a multiple of the previous number, as this results in the largest possible number of resources keeping their old URIs.

Jon Hanna
Aha, that would be perfect. I didn't even think of something like summing the characters--great idea :)
Kyle Slattery
Sometimes solutions can be so simple that they're hard to see when you've spent all day dealing with complexities.
Jon Hanna