views:

208

answers:

5

I am setting up a online video playing web site (like Youtube). My technical challenge is to serve a lot of hits and still maintain performance.

My current solution is to set up several back-end servers, having each server cache a part of the video which could save the time to read the video file from disk I/O.

Another front-end server will hash the request video ID to find out which server the video resides on, and then ask the client browser to redirect to the specific server.

My solution is simple, and I want to know whether anyone else have any better ideas or any technical considerations for my solution?


Please note: I want to set up the site to work locally (and not rely on providers like Alakami) as the content is for local students from my school. This will essentially be an 'intranet' solution.

A: 

Are you hosting this on Windows in IIS7? if so they have a module for IIS to do throttling on the video's so it is not streamed faster then the user can actually watch it.

JoshBerke
Two comments, 1. I am using IIS 6.0. Could I install IIS 7.0 on Windows Server 2003? 2. Another problem is, what I need is a submit/approve to publish/category system, not just streaming solution. I think the solution from IIS 7.0 only provides streaming, no other parts?
George2
Hi Josh, I thought of another issue. For IIS 7.0 there is only one machine, the network capability is very limited. Is there any built-in solution from IIS 7.0 video solution to cluster a couple of servers to do load balancing?
George2
A: 

One solution for very high performance is to use a data distribution service like Akamai. They offer server space all over the world and they have solved the performance issue already. Also, since they have data centers all over the world, your data doesn't have to travel very far which is good for the Internet and for you (since Akamai can charge lower fees for the same amount of data).

Aaron Digulla
Thanks Aaron Digulla, I want to setup by myself as the content is for my school local students. There is no need to open the content to outside and serve world wide people. :-)My challenge is whether you have any better ideas or any technical issues for my solution?
George2
Please keep in mind that a lot of people will search and read your question. So having an answer which doesn't fit *your* problem 100% but might help someone else is a good thing. Or: It will be hard to find an answer when there are 100 questions similar to yours.
Aaron Digulla
Thanks I agree. My soecific question is to setup locally and serve locally in an intranet for intranet users, like a school. So you have any comments to my technical solution in my original question or any better ideas? :-)
George2
A: 

someone posted this link on Youtube scaling to me recently might be useful / relevant.

http://video.google.com/videoplay?docid=-6304964351441328559

Paul
Cool! I like it! But any solution to build and setup lcoally?
George2
+1  A: 

Your solution will not perform well when all users request the same video. A better solution is to have all videos available on all servers and use a load balancing server to redirect the current request to the server which has the lowest number of feeds open.

Note that storage back ends (RAID arrays, SAN) can deliver data at a very high rate, so you often can get away with one storage system for several video servers (i.e. one storage system per N video servers and 1 load balancer (or two if you want failover)).

A good solution here is to have a "redirect" command in the protocol:

  1. Client asks load balancer (LB) for video
  2. LB tells client which video server (VS) to use. This is a simple "find VS with the lowest amount of open feeds."
  3. Client connects directly to VS (to avoid all overhead)
  4. VS tells LB the current amount of open feeds (don't use an incremental approach here to avoid synchronization issues)
  5. VS begins streaming the data to the client
  6. When a client disconnects, VS tells LB of the new number of feeds

[EDIT] The main reason to get the clients to connect directly to the video servers is network throughput. If all VS send their data to the LB who passes it on to the clients, you are limiting yourself to the speed of the single (or dual) network card of the LB. If you have 5 VS, you can get five times the throughput when connecting directly. Also, you can easily scale your system when more users hammer it by simply adding another video server, plugging it into the backbone and adding one entry to the list on the LB.

Aaron Digulla
Cool solution! One quesiton, what do you mean -- "so you often can get away with one storage system for several video servers."? Do you mean using one server to store video is enough or? Sorry I am not English native speaker. Appreciate if you could say in some other words. :-)
George2
Sorry one more comment, what do you mean is the most simple and elegant way to implement redirect from LB side?
George2
Re Storage: Yes. You have one box with all the disks. I have a RAID array which can deliver 530MB/s. Theoretically, this can server about 5 clients connected via 1GB links. (1GBit ~ 100MB/s)
Aaron Digulla
Re redirect: What will you win if you pump all the data from the video servers through one box? You will limit yourself to what the network card of that box can deliver. If each client connects directly to a VS, you get N times the throughput.
Aaron Digulla
One more comment -- "I have a RAID array which can deliver 530MB/s. Theoretically, this can server about 5 clients connected via 1GB links. (1GBit ~ 100MB/s)" -- how do you calculate the result 5 clients via 1GB? from a 530M RAID?
George2
Re implementing redirect: That depends on the protocol that you are using. If you are using HTTP, you can return a 302 (temporary move). See http://en.wikipedia.org/wiki/HTTP_302
Aaron Digulla
Re throughput: A 1GB network card can transfer roughly 100MB/s (1 byte takes 10 bits, so you just divide the speed by 10 to get a rough estimate). What confuses you is the setup: You need a fibre link or something *faster* than ethernet between the RAID and the VS.
Aaron Digulla
The VS then talks to the client via a 1GB network link.
Aaron Digulla
Hi Aaron Digulla, still a little confused about your calculation, your RAID array which could deliver 530MB/s, this can server about 5 clients connected via 1GB links. I want to know how do you calculate the number 5? :-)
George2
530MB/s ~ 5.3GBit/s divided by 1GB/s ~ 5
Aaron Digulla
"~" means "is about"
Aaron Digulla
Cool, Aaron Digulla! I have another related question. In normal case, the ethernet is 1G, how could we make the full use of 5G speed of RAID, is it a waste? Thanks!
George2
Hi Aaron Digulla, I thought of another issue, I think the 5G speed of RAID including input (upload) plus output (download) speed, correct? Normally, upload is much less than download for video, how to split the bandwidth more reasonable in practice? :-)
George2
When you buy a RAID system, buy optical network cards to connect the RAID with the video servers. Optical ethernet can do 10GBit.
Aaron Digulla
Up/download: Assign each type of connect points, say 10 points per download and 1 point per upload. And how about giving me some appreciation for all the work I'm doing for you?
Aaron Digulla
A: 

Try Amazon's CloudFront (a CDN) if your users are distributed all over the world. If your users are localized (US/Europe), you can use S3.

Additionally, you could also try using Nginx (a web server) which is extremely efficient at serving large files.

This way, you don't have to deal with un-necessary architectural complexity within your application.

Nikhil Gupte