How would you go about generating the unique video URL's that YouTube uses?
Example:
How would you go about generating the unique video URL's that YouTube uses?
Example:
Using some non-trivial hashing function. The probability of collision is very low, depending on the function, the parameters and the input domain. Keep in mind that cryptographic hashes were specifically designed to have very low collision rates for non-random input (i.e. completely different hashes for two close-but-unequal inputs).
This post by Jeff Attwood is a nice overview of the topic.
And here is an online hash calculator you can play with.
You could generate a GUID and have that as the ID for the video. Guids are very unlikely to collide.
I don't think that the URL v parameter has anything to do with the content (video properties, title, description etc).
It's a randomly generated string of fixed length and contains a very specific set of characters. No duplicates are allowed.
There is no need to use a hash. It is probably just a quasi-random 64 bit value passed through base64 or some equivalent.
By quasi-random, I mean it is just a one-to-one mapping with the counting integers, just shuffled.
For example, you could take a monotonically increasing database id and multiply it by some prime near 2^64, then base64 the result. If you did not want people to be able to guess, you might choose a more complex mapping or just pick a random number that is not in the database yet.
Normal base64 would add an equals at the end, but in this case it is implied because the size is known. The character mapping could easily be something besides the standard.
Your best bet is probably to simply generate random strings, and simply keep track (in a DB for example) of which strings you've already used so you don't duplicate. This is very easy to implement, and it cannot fail if properly implemented (no duplicates, etc).
Just pick random values until you have one never seen before.
Randomly picking and exhausting all values form a set runs in expected time O(nlogn)
: http://stackoverflow.com/questions/1293939/what-is-o-value-for-naive-random-selection-from-finite-set/2169615#2169615
In your case you wouldn't exhaust the set, so you should get constant time picks. Just use a fast data structure to do the duplication lookups.