views:

941

answers:

3

What design considerations must be taken when writing software for content-distribution systems, such as managing the synchronisation and distribution of data, redirecting downloads to the nearest servers and so on?

I am also looking for examples of open source CDN (content delivery network) software. I can think of two projects, CoralCDN and OpenCDN

Please note that there is more to a CDN than just hardware bandwidth. A CDN is a combination of software and hardware.

What I would like to go for, is software for streaming media as well as static assets. I'm having trouble figuring out how to properly synch streaming media across servers (since the file access can be fairly random), while static assets seem a little bit easier, as it's a one time request.

+6  A: 

You realize that the value a content delivery network has is purely in the number of servers they own and the proximity of the servers to end users?

Are you sure you're going down the right path?

Chris Lively
You don't instantly have a CDN by purchasing 200 servers. There has to be both planning and (most important to this question) *software* in place to manage all that data (among other things)!
dbr
@dbr: you're right. However, the implication is that if you are spending potentially millions on hardware then the question about whether the software package is open source or not is moot as you should have the funds to get whatever you need.
Chris Lively
Incidentally, the current question is radically different than the original one that I answered.
Chris Lively
+3  A: 

A CDN is not a piece of software. Please at least google or look for the thing on wiki. http://en.wikipedia.org/wiki/Content_Delivery_Network

StingyJack
+3  A: 

While I do not know any Open Source Projects, maybe it is worth summarizing what a CDN Actually is? After all, just taking a bunch of Web Servers will not bring you anywhere.

The key problems that CDN Software has to solve:

  • Synchronization. So you have all your neat farms in the US, in Europe and in Asia, but how do you make sure that they all have the same versions of the files you're trying to serve? And if one of the farms does not have the current version, how do you tell the load balancer which farm to use instead?
  • Logging. In a CDN, you usually want to bill your customer, so you need to measure the traffic and file accesses. But with multiple farms and multiple Web Servers in each farm, you need to somehow centralize logging
  • Authentication. After all, a CDN is not just a Web Server delivering HTTP Content to everyone. What if you have a CDN for video streaming that actually restricts access to only certain users?
  • Load-Balancing. While this is usually done separately, this also links to the Synchronization part. So I am a user from South Korea trying to access the content. The Load Balancer finds out that the Farm in Seoul is the nearest - but unfortunately, Seoul's Farm does not have the content yet. So the CDN and Load Balancer need to figure out what the nearest Farm that has the content is. Let's see... Both Paris, France and Los Angeles, USA have the content. Which one should serve?

Each problem in itself is not a CDN-exclusive problem, but CDN Software is essentially a combination of these techniques. Any others that I forgot?

From the comments:

  • Determining which files need to be replicated where. A Japanese Windows Update may be highly popular in Japan and maybe some other Asian countries, but Europe and US possibly have fewer requests to it, so this file may not need to be replicated across every farm in the CDN.
Michael Stum
Level of redundancy? Similar to the sync point, and would depend on the CDN size. Could range from having all data on farm, to selectively duplicating data to the country it's most popular in? Eg, arkami probably don't need to have win98.update123.japan.exe highly available in US..
dbr