tags:

views:

311

answers:

6

How would you go about generating the unique video URL's that YouTube uses?

Example:

+10  A: 

Using some non-trivial hashing function. The probability of collision is very low, depending on the function, the parameters and the input domain. Keep in mind that cryptographic hashes were specifically designed to have very low collision rates for non-random input (i.e. completely different hashes for two close-but-unequal inputs).

This post by Jeff Attwood is a nice overview of the topic.

And here is an online hash calculator you can play with.

Eli Bendersky
Are there any references that YouTube uses a hashing function?
cherouvim
Why even allow the possibility of a collision when you don't need to? Just start the ID at 0000000000 and increment for each new page. If that doesn't "look" random enough for you, then just take a couple of large prime numbers, P and Q, and use the sequence hash(n)=Pn mod Q
Peter Alexander
@cherovium: I don't have a reference. But my answer was more general "idea throwing" than a specific reference to YouTube. After all, the OP's question started with "How would you go about"... :-)
Eli Bendersky
+1  A: 

You could generate a GUID and have that as the ID for the video. Guids are very unlikely to collide.

Patrick
+1  A: 

I don't think that the URL v parameter has anything to do with the content (video properties, title, description etc).

It's a randomly generated string of fixed length and contains a very specific set of characters. No duplicates are allowed.

cherouvim
+4  A: 

There is no need to use a hash. It is probably just a quasi-random 64 bit value passed through base64 or some equivalent.

By quasi-random, I mean it is just a one-to-one mapping with the counting integers, just shuffled.

For example, you could take a monotonically increasing database id and multiply it by some prime near 2^64, then base64 the result. If you did not want people to be able to guess, you might choose a more complex mapping or just pick a random number that is not in the database yet.

Normal base64 would add an equals at the end, but in this case it is implied because the size is known. The character mapping could easily be something besides the standard.

drawnonward
nice, thanks. Got any examples of where this is used?
BenB
A: 

Your best bet is probably to simply generate random strings, and simply keep track (in a DB for example) of which strings you've already used so you don't duplicate. This is very easy to implement, and it cannot fail if properly implemented (no duplicates, etc).

Cam
A: 

Just pick random values until you have one never seen before.

Randomly picking and exhausting all values form a set runs in expected time O(nlogn): http://stackoverflow.com/questions/1293939/what-is-o-value-for-naive-random-selection-from-finite-set/2169615#2169615

In your case you wouldn't exhaust the set, so you should get constant time picks. Just use a fast data structure to do the duplication lookups.

Thomas Ahle