You are asking a very difficult question to answer.
I recommend that as an introductory read you check out Youtube Architecture on High Scalability. Youtube is a very good real-life example of how a media-centric website works.
Surprising as it may be, serving the actual media files is not the bottleneck. The harder part is getting all the media meta-data synched, generating thumbnails, etc. Media files can always be hosted from a cluster, or from a CDN in case of an extremely popular video.
Read the link for more in-depth info.