views:

300

answers:

7

Why does many sites (youtube is good example) generate string of random number and letter instead of using for example the row id?

usually its something likes this

bla?v=wli4l73Chc0

instead of like

bla?id=83934

Is it just to keep it short if you have many rows? Or is there other good things about this? Because i can imagine: bla?id=23934234234 dont look so nice

Thanks and cheers

+3  A: 

I would guess it's to obfuscate information and to add/increase the amount of information that can be passed via that parameter.

Alex
+4  A: 

in distributed environment it is simpler to generate random numbers for identifiers than sequential numbers.

Yaroslav
If they are random, how do you stop collisions? Especially in a distributed environment.
Felix
@Felix, good question. maybe millisecond-resolution timestamps?
Earlz
Well, how do UUIDs work? (See http://en.wikipedia.org/wiki/Universally_Unique_Identifier.)
SamB
128 bits are enough for practical applications to prevent collisions.http://tools.ietf.org/html/rfc4122
Yaroslav
@Earlz could be, but wouldn't you have to store the timestamp **alongside** the actual random ID? Also, adding to the answer: in (most?) non-relational DBs there's no such thing as `AUTOINCREMENT`, so you have to give it an ID yourself (and it would be suboptimal to perform a query to find the largest ID every time you are inserting a new row).
Felix
@Felix - random generation on such a scale as 128 bits has much more likely hood of being unique than an incrementing sequence especially in a distributed environment. Concurrency issues could easily lead to "MAX + 1" calculating the same result. Whereas with random generation, even concurrent requests will result in differing results.
Michael Shimmins
In the example given, an eleven-digit string of case-sensitive letters and numbers, the code space is 5.2E19. We could generate over 9 billion codes, on average, before having an even chance of a collision.
tloflin
+1  A: 

Having raw row ids, or other unmodified database parameters in urls, is bad security practice. Far better to have hashes into some large domain.

Rob Lachlan
+2  A: 

I honestly am not sure why they wouldn't use the unique ID (or ObjectID or whatever depending on what database) so have you ever wondered if rather than representing the ID in base-10, they represented it in a higher base (such as 64, or whatever is capable within URLs) so that the ID is more compact on the query string? (read: wli4l73Chc0 is some number in non-base-10)

Earlz
wli4l73Chc0 in base-36 would be 119180968748356030 in base-10. Sounds plausible for a youtube video ID :)
Strelok
A: 

Some environments also use this to establish state variables for the session. For example, if you have an ASP.Net app that is using cookieless sessions, you'll find a similar code in the URL.

Dillie-O
+2  A: 

I upvoted Rob's answer, but I'll also elaborate a bit on one of the risks.

If you publish a link like http://stackoverflow.com/questions/2581510 where 258510 is a database id someone trying to hack your site is going to try connecting to http://stackoverflow.com/questions/2581511.

With stackoverflow, this may not be a database id, and the questions on stackoverflow are not supposed to be private, so it's not a big deal even if it is.

But if this were a site where restricting data access to owners of the data were important, this potentially risks letting people see data they shouldn't.

There are of course things you can and should do to make it refuse to show the data if they don't own it, but it's still better to make the url not identify a database id. It's better, as Rob noted, to have a hash into some much larger domain, or an session-based index into a set of data already identified as appropriate to show the user and available only within a logged-in session.

Don Roby
+1  A: 

They are actually not random strings. Normally they are numbers (usually row IDs) that are encoded in Base-36 encoding (obviously not always the case, but there are many that use it).

Why do they use it? Because a Base-36 encoded number string is shorter than the original.

For example: 1234567890 in Base-36 is kf12oi, almost 50% shorter.

See this Wikipedia article. Check the "Uses in practice" section to see who is using it.

Strelok