views:

147

answers:

4

I'm starting to develop an application using MySQL and although I've developed apps before using databases, I've normally gone to the incrementing id method. The other day I was reading a posting somewhere and saw someone being picked apart for this, for not properly "normalising the database". I'm not a big database person, but I wanted to make sure I'm doing this right.

Any ideas, help / guidance?

+1  A: 

Incrementing id has nothing to do with normalisation. Normalisation has to do with removing transitive and reducing dependencies. Whether you choose to use an auto incrementing id or some other form of primary key is a different matter altogether.

Vincent Ramdhanie
Thanks for the answer. I haven't really studied databases in years, and even then I was doing what I was shown to do rather than understand. I guess I took the reply out of context. Thanks for the clarification.
Hammerstein
+4  A: 

There are two main methods for picking a primary key:

  • Selecting the primary key from existing natural candidate keys.
  • Creating a surrogate key, and setting it as the primary key. Unlike a natural key, a surrogate key is not derived from application data. This is the type of key you were having with the auto incrementing method.

There are general advantages and disadvantages for both methods, and you may want to chose a natural key in some situations and a surrogate key in others.

I hope this serves as a brief introduction, so that you may be able to go further into your research on how to choose appropriate primary keys.

Daniel Vassallo
Thanks for the reply and the links. My efforts have mainly been focused around desktop and web development and have never put much thought to databases. The comment the other day put this big question mark of "am I doing this in a really bad way" which no self respecting developer wants to be accused of. I'll do some reading up, thanks for the starting points.
Hammerstein
Where natural keys are available, choosing a surrogate key over a natural key is often done for some type of performance reason, and while it may break normal form, it can be a legitimate choice for the designer. The general rule is that unless you have a specific, objective, reason to break normal form, don't do it. Some people blindly pick surrogate auto-increment keys for every table without having a legitimate reason.
Marcus Adams
@Marcus: even worse, I've worked in more than 1 shop that *mandated* surrogate auto-increment keys, even for tables that have perfectly logical and usable natural keys.
MusiGenesis
A: 

Auto-incremented IDs and normalization are not related (i.e. you can use auto-incremented IDs in a de-normalized database, or have a normalized database that doesn't use auto-incremented IDs).

MusiGenesis
To be more clear, the person that was being "picked apart" was most likely breaking normal form by choosing a surrogate key over a natural key for his particular case. It really made no difference that the surrogate key also happened to be an auto-increment key.
Marcus Adams
Thank-you for the help. Reading time.
Hammerstein
@Marcus: I certainly hope nobody was being "picked apart" just for using a surrogate key. That's not exactly a settled debate.
MusiGenesis
+1  A: 

COMPOSITE PK APPROACH

drop table if exists users;
create table users(
 user_id int unsigned not null auto_increment primary key, -- clustered auto_inc PK
 username varbinary(32) not null,
 unique key users_username_idx(username)
)engine=innodb;

insert into users (username) values ('f00'),('bar'),('bish'),('bash'),('bosh'),('F00');

drop table if exists user_friends;
create table user_friends(
 user_id int unsigned not null,
 friend_user_id int unsigned not null,
 primary key (user_id, friend_user_id) -- clustered composite PK
)engine=innodb;

insert into user_friends values
(1,2),(1,3),  (2,1),(2,5),  (3,5),  (4,1),(4,2),(4,3),(4,5),(4,6),  (5,4),(5,1);

most of the time i query user_friends wanting to list all the friends for a given user so my queries are taking advantage of the clustered primary key (user_id, friend_user_id)

select * from user_friends where user_id = 4; -- uses PK index

if i want to delete a friend i have a composite PK so i need to specify the user_id and friend_user_id that i want to delete - both parts of the key need to be satisfied. This has a habit of making your application logic a little more complicated.

delete from user_friends where user_id = 4 and user_friend_id = 5; -- uses PK index

AUTO INC PK APPROACH (user table remains the same)

drop table if exists user_friends;
create table user_friends(
 friend_id int unsigned not null auto_increment primary key, -- clustered auto_inc PK
 user_id int unsigned not null,
 friend_user_id int unsigned not null,
 unique key user_friends_idx (user_id, friend_user_id) -- non clustered secondary index
)engine=innodb;

most of the time i query user_friends wanting to list all the friends for a given user in this case i can't take advantage of the primary key and i need to create a secondary index on (user_id, friend_user_id). Now i have 2 indexes on this table the PK index enforcing entity integrity and a secondary index which helps optimize my queries.

Adding the secondary index has a cost associated to it. inserts/deletes now have to update 2 indexes vs. just one composite key index and it takes up more disk space ofc.

select * from user_friends where user_id = 4; -- uses secondary index not PK

however, if i want to delete a friend i can use the friend_id PK this simplifies my application logic as i can delete 1 row easily just using the PK.

delete from user_friends where friend_id = 10; -- uses PK index

This probably isnt a great example of the two types of approaches but it should give you some ideas into why clustered composite indexes are good to use at times :)

f00
Thank-you for the examples, code always helps things settle in a little easier. I appreciate you taking the time.
Hammerstein
@Hammerstein no problem, hope it helps :)
f00