views:

368

answers:

4

Can anyone share me how does youtube stored video related information in there tables ? What would be the table structure and what would be the various columns in tables and the relations between them ?

Thanks in advance

+1  A: 

There are only two ways to find out:

1) Hack into YouTube and find out yourself

2) Get a job at Google and hope you get to be on the YouTube-design team.

As for getting others to do this and tell you: I doubt that anyone has been able to pull (1) off. (Sure there have smaller hacks, but AFAIK nothing that would reveal what you ask for). And anyone doing (2) probably is not allowed to tell you.

What I do wonder is why you want to know. Even if you knew I doubt there was a way to make use of it on YouTube itself, so the only use I can think of is rebuilding it on your own website. If that is indeed your goal, you will have to think of a database design yourself, I am afraid.

Jasper
Sites like highscalability.com shares some architectural information of big sites to help devlopers to learn about it . I am sure that youtube is making use of the best design pattern.I am not going to rebuild youtube(how can a beginner programmer compate with a giant ?) .I m creating a page where i am uplading files.just wanted to have a bare aidea about their design. anyway hats off to Google Engineers
Shyju
They are not using the best design pattern. They are using what is the a good design technique in their case. I am saying good, because best is of very much relative, and what's considered the best by one may not be so by another - and if something new is invented that may become the 'best'. And they use what's best in their situation, which does not have to be the best in yours. Actually, I think it would do you little good to know how they are doing things. You probably are not going to spread files over hundreds of servers and you're not using Google's DBMS and it's not on GoogleFS
Jasper
+7  A: 

I believe they store files themselves on disk and keep track of them in a group of Excel files. So when a user requests a page, the appropriate Excel file is parsed, links to files are extracted and displayed along with the properties to the user.

Using Excel offers advanced functions like getting reports on load count and other usage statistics on files. For example, you can build graphics to show you the geographical distribution of daily reach for particular videos based on IP addresses of those requests.

Excel files themselves have a relatively simple structure and can be processed at a very high speed, compared to the ubiquitous database solutions where for each single requests to process many services have to be invoked to interoperate which causes a response lag and a reduced processing rate which can be noticeable on high load sites with millions of requests coming in.

User
Genius... I honestly can't tell if you're joking or not.
skaffman
Hahaha +1 from me :)
the_drow
didn't they migrate to OOo Calc recently?
devio
i once heard they were using Lotus Notes database
n002213f
Sure. Aren't they also storing their videos in the ion modulations in raw eggs and using eggstirrers hooked up to usb to read them?
Jasper
A: 

Mastermind clearly has the more correct answer (I've voted him up), but for interest...

YouTube has an interesting database architecture that in an odd way reflected their eventually being taken over by Google.

As everyone knows Google has an odd take on making reliable servers - instead of creating expensive high reliability servers (with features like redundent power supplies), they instead use many cheap commodity machines, and combine it with a software and storage architecture designed for failure.

YouTube mirrors that with their database system. They use MySQL and its infamous MyISAM tables for speed. Alone this would be a recipe for disaster, as YouTube would end up posting the "sorry the database got corrupted you have to recreate your account" message more frequently then any other php powered forum out there.

Instead YouTube created a layer to duplicate data across several databases - not mirroring in the traditional sense, but instead like a kind of redundent load balancing or stripped RAID setup where a record is redundently stored in some but not all of the databases. This not only allows individual MySQL databases to crash (its trivial to automate the automatic deletion and recreation of those databases with MySQL), but also allows them to scale their database backend in ways monolithic databases cannot - they can simply add extra machines and let the system populate them with the excess data.

  • All information above may be incorrect and simply an attempt to throw you off the scent. Or it may be perfectly accurate up to the Google purchase, at which time nobody knows how it might have been changed to make it mesh with Google's unique backend.
David
A: 

The videos are stored on disk and given their unique IDs ('GHa93n0GjBU' is an Avatar clip) Then they'll have a simple table that contains details such as: Unique video ID User ID Date uploaded Length Description Maybe a list of related video IDs stored in some sort of comma seperated list

Probably then a second table for Comments which links to the video by it's Unique ID

mark