views:

159

answers:

2

I'm trying to build a web application that is similar to Youtube (it's not a knock off), but I guess I don't know how video is served on the internet very well.

I know how to build regular database driven web applications, but nothing like the scalability of Youtube. All of the applications I have built before have all been run on one server with the files stored on the same box as the web server.

How does one decouple the application server from the file storage from the media server?

I would more or less want 4 machines (clusters of machines)

1.) Application servers -- Present the web page, handle user uploads, link the user's flash player to the correct media server etc.

2.) Database shards -- Store user information, check favorites, etc.

3.) File storage -- Store the media files

4.) Media servers -- Serve the media files

How do I hook all of this together? Which technologies should I leverage? Where do I go to learn more about architecting this?

How does Youtube's embeddable flash stuff work? I want to embed my flash player on other websites and have it tie into my architecture.

Note I have looked into: http://highscalability.com/youtube-architecture

But I still don't get the overall picture of how this stuff ties together.

If someone can explain in high level terms how all of this stuff works?

Are there dedicated client servers running internally to shuffle around all of this stuff between the application servers, file storage, etc. Is it all via HTTP using JSON, what is going on here!

Thanks

+1  A: 

Two books I'd recommend are:

The latter is by the director of engineering at flickr. Not youtube, but I think you'll find it enlightening.

Beyond that, the High Scalability blog is a good source of case studies and collected wisdom, all of which provide a good starting point for further exploration.

ars
A: 

Start by hiring the right people; if you hire smart people, they'll be able to come up with answers to these questions, and more which will crop up.

Also, start at the scale that you plan to initially operate at. Don't plan for scalability you don't need. You aren't going to be making another Youtube - even if you're very successful within your field.

Scalability is expensive - very expensive - to develop and maintain. If you don't need it, it will drain your resources and restrict your developers needlessly. Just building a credible test environment for high performance systems tends to be a big job, and such a system would require several such environments.

MarkR
OK well I guess my first question is, I have the database and application server stuff down, but if I need to have a large repository of files needed to be accessed by Flash how do I do that?