views:

329

answers:

7

When creating a web application that some how displays the display of a unique identifier for a recurring entity (videos on YouTube, or book section on a site like mine), would it be better to use a uniform length identifier like a hash or the unique key of the item in the database (1, 2, 3, etc).

Besides revealing a little, what I think is immaterial, information about the internals of your app, why would using a hash be better than just using the unique id?

In short: Which is better to use as a publicly displayed unique identifier - a hash value, or a unique key from the database?

Edit: I'm opening up this question again because Dmitriy brought up the good point of not tying down the naming to db specific property. Will this sort of tie down prevent me from optimizing/normalizing the database in the future?

The platform uses php/python with ISAM /w MySQL.

A: 

Hashes aren't guaranteed to be unique, nor, I believe, consistent.

James Curran
They are consistent, but you're right in saying they're not unique (there are collisions by definition). However, collisions can be avoided to a large extent by appending some sort of random or vague salt (microtime or random number) before hashing.
Karan
Adding a salt will have no bearing on the number of collisions.
Xenph Yan
Good point. Scratch that.
Karan
.. but if you add a totally random number every time, they are indeed not consistent! :-)
SquareCog
+2  A: 

I typically use hashes if I don't want the user to be able to guess the next ID in the series. But for your book sections, I'd stick with numerical id's.

Xenph Yan
Stick with numerical id's even if there are many books written by different authors? The first author is going to get a bunch of numbers ranging from 1ish to 20ish, then the next will get 21ish to 30ish. Is this bad in any way?
Karan
No, it sounds like typical database indexing.
Xenph Yan
Using the hash makes URL guessing a bit harder, but you need better security than that anyway. Unless you want to hide the order they were created in the database or perhaps the count that you have. e.g. does anyone need to know that you are user id number 8 vs 7,000,000.
WW
A hash doesn't generate a "next ID", so it's not a substitute for a numerical ID. Unless you meant a hash of something plus a numerical ID. Otherwise, a collision is likely. (Assume 32bit object hashes, given "random" inputs, expect a collision in 65K items?) You want a GUID instead.
MichaelGG
A: 

will your users have to remember/use the value? or are you looking at it from a security POV?

From a security perspective, it shouldn't matter - since you shouldn't just be relying on people not guessing a different but valid ID of something they shouldn't see in order to keep them out.

Steven Adams
It does matter from a security perspective, actually. Being able to tell the *order* of ids is a lot more information for cryptanalysts than just being able to occasionally hit a random collision.That's assuming there's crypto to analyze, of course.
SquareCog
A: 

Yeah, I don't think you're looking for a hash - you're more likely looking for a Guid.If you're on the .Net platform, try System.Guid.

However, the most important reason not to use a Guid is for performance. Doing database joins and lookups on (long) strings is very suboptimal. Numbers are fast. So, unless you really need it, don't do it.

Travis
+1  A: 

Using hashes is preferable in case you need to rebuild your database for some reason, for example, and the ordering changes. The ordinal numbers will move around -- but the hashes will stay the same.

Not relying on the order you put things into a box, but on properties of the things, just seems.. safer.

But watch out for collisions, obviously.

SquareCog
@Dmitriy, the ordering is unimportant. The uniqueness is the important issue here. If you import a list of sequential ints into a new DB, it'll work just fine.
Xenph Yan
A: 

With hashes you

  1. Are free to merge the database with a similar one (or a backup), if necessary
  2. Are not doing something that could help some guessing attacks even a bit
  3. Are not disclosing more private information about the user than necessary, e.g. if somebody sees a user number 2 in your current database log in, they're getting information that he is an oldie.
  4. (Provided that you use a long hash or a GUID,) greatly helping youself in case you're bought by YouTube and they decide to integrate your databases.
  5. Helping yourself in case there appears a search engine that indexes by GUID.

Please let us know if the last 6 months brought you some clarity on this question...

ilya n.
+1  A: 

Unless you're trying to hide the state of your internal object ID counter, hashes are needlessly slow (to generate and to compare), needlessly long, needlessly ugly, and needlessly capable of colliding.

For inventory-like things, just use a sequential (or sharded) counter instead. If you migrate to a different database, you will just have to initialize the new counter to a value at least as large as your largest existing record ID. Pretty much every database gives you a way to do this.

If you are trying to hide the state of your counter, perhaps because you're counting users and don't want competitors to know how many you have, I suggest avoiding the display of your IDs. If you insist on displaying them and don't want the drawbacks of a hash, you might consider using a maximal-period linear feedback shift register to generate IDs.

Forest