views:

372

answers:

5

Greetings,

I have some mysql tables that are currently using an md5 hash as a primary key. I normally generate the hash with the value of a column. For instante, let's imagine I have a table called "Artists" with the fields id, name, num_members, year. I tend to make a md5($name) and use it has an ID.

I would like to know what are the downsides of doing this. Is it just better to use integers with AUTO_INCREMENT ? I tend to run away from this because it's just not worth the trouble of finding out what the last id inserted was, and what will be the next etc.

Can you give me some lights on this?

Thank you.

A: 

The benefits are if you present the IDs to customers (say in a query string for a web form, though that is another no-no)... it prevents users guessing another one.

Personally I use auto-increment without problems (have moved DBs to new servers and everything without problems)

IPX Ares
+2  A: 

The first approach has one obvious disadvantage: if there are two artists of the same name there will be a primary key collision. Using an INT column with an auto-increment will ensure uniqueness.

Furthermore, though very unlikely, there is a chance that MD5 hashes of different strings could collide (I seem to recall the probability as being 1 in 36 to the power of 32).

karim79
+2  A: 

The MD5 isn't a true key in this case because it functionally depends on the name. That means that if you have two artists with the same name, you have duplicate "keys" for different records. You could make it a real key by hashing all the attributes together (and hoping that the probability gods don't send you a collision), or you could just save yourself the trouble and use an autoincrementing ID.

Thom Smith
+2  A: 

If you need a surrogate primary key, using an AUTO_INCREMENT field is better than an md5 hash, because it is fewer bytes of data, and database backends optimize for integer primary keys.

mysql_insert_id can be used if you need the last inserted id.

If you are generating the primary key as a hash of other columns, why not just use those other columns as a unique key, then join on those?

Another question is, what are the upsides of using an md5 hash? I can't think of any.

Christian Oudard
I guess you are correct. Thanks.
John 5
+1  A: 

It seems like the way you're trying to use the MD5 isn't really buying you any benefit. If "$name" is unique, then why not just use "name" as the primary key? Calculating an MD5 hash and using it as a key for something that's already unique is redundant.

On the other hand, if "name" is not unique, then the MD5 hash won't be unique either and so it's pointless that way too.

Generally you use an MD5 hash when you don't want to store the actual value of the column. For instance, if you're storing passwords, you generally only store the MD5 hash of the password, not the password itself, so that you can't see people's passwords just by looking at the table contents.

If you don't have any unique fields, then you're stuck doing something like an auto-increment because it's at least guaranteed unique. If you use the built-in SQL auto-increment, then you'll just have to fetch the last one way or another. Alternately, if you can get away with keeping a unique counter locally in your application, that avoids having to use auto-increment, but isn't necessarily viable for most applications.

Brent Nash