YouTube URL algorithm?

tags:

algorithm

views:

311

answers:

+7 Q:

YouTube URL algorithm?

How would you go about generating the unique video URL's that YouTube uses?

Example:

http://www.youtube.com/watch?v=CvUN8qg9lsk

+10 A:

Using some non-trivial hashing function. The probability of collision is very low, depending on the function, the parameters and the input domain. Keep in mind that cryptographic hashes were specifically designed to have very low collision rates for non-random input (i.e. completely different hashes for two close-but-unequal inputs).

This post by Jeff Attwood is a nice overview of the topic.

And here is an online hash calculator you can play with.

Eli Bendersky 2010-06-14 03:47:46

Are there any references that YouTube uses a hashing function?

cherouvim 2010-06-14 04:43:43

Why even allow the possibility of a collision when you don't need to? Just start the ID at 0000000000 and increment for each new page. If that doesn't "look" random enough for you, then just take a couple of large prime numbers, P and Q, and use the sequence hash(n)=Pn mod Q

Peter Alexander 2010-06-14 08:20:14

@cherovium: I don't have a reference. But my answer was more general "idea throwing" than a specific reference to YouTube. After all, the OP's question started with "How would you go about"... :-)

Eli Bendersky 2010-06-14 14:59:43

+1 A:

You could generate a GUID and have that as the ID for the video. Guids are very unlikely to collide.

Patrick 2010-06-14 04:16:35

+1 A:

I don't think that the URL v parameter has anything to do with the content (video properties, title, description etc).

It's a randomly generated string of fixed length and contains a very specific set of characters. No duplicates are allowed.

cherouvim 2010-06-14 04:22:08

+4 A:

There is no need to use a hash. It is probably just a quasi-random 64 bit value passed through base64 or some equivalent.

By quasi-random, I mean it is just a one-to-one mapping with the counting integers, just shuffled.

For example, you could take a monotonically increasing database id and multiply it by some prime near 2^64, then base64 the result. If you did not want people to be able to guess, you might choose a more complex mapping or just pick a random number that is not in the database yet.

Normal base64 would add an equals at the end, but in this case it is implied because the size is known. The character mapping could easily be something besides the standard.

drawnonward 2010-06-14 04:29:58

nice, thanks. Got any examples of where this is used?

BenB 2010-06-14 05:57:52

Your best bet is probably to simply generate random strings, and simply keep track (in a DB for example) of which strings you've already used so you don't duplicate. This is very easy to implement, and it cannot fail if properly implemented (no duplicates, etc).

Cam 2010-06-14 04:41:52

Just pick random values until you have one never seen before.

Randomly picking and exhausting all values form a set runs in expected time O(nlogn): http://stackoverflow.com/questions/1293939/what-is-o-value-for-naive-random-selection-from-finite-set/2169615#2169615

In your case you wouldn't exhaust the set, so you should get constant time picks. Just use a fast data structure to do the duplication lookups.

Thomas Ahle 2010-06-14 13:49:14

ansaurus

tags:

views:

answers:

YouTube URL algorithm?

related questions