views:

770

answers:

9

Hello,

I'm currently in the process of designing the database tables for a customer & website management application. My question is in regards to the use of primary keys as functional parts of a table (and not assigning "ID" numbers to every table just because).

For example, here are four related tables from the database so far, one of which uses the traditional primary key number, the others which use unique names as the primary key:

--
-- website
--
CREATE TABLE IF NOT EXISTS `website` (
  `name` varchar(126) NOT NULL,
  `client_id` int(11) NOT NULL,
  `date_created` timestamp NOT NULL default CURRENT_TIMESTAMP,
  `notes` text NOT NULL,
  `website_status` varchar(26) NOT NULL,
  PRIMARY KEY  (`name`),
  KEY `client_id` (`client_id`),
  KEY `website_status` (`website_status`),
) ENGINE=InnoDB DEFAULT CHARSET=latin1;

--
-- website_status
--
CREATE TABLE IF NOT EXISTS `website_status` (
  `name` varchar(26) NOT NULL,
  PRIMARY KEY  (`name`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
INSERT INTO `website_status` (`name`) VALUES
('demo'),
('disabled'),
('live'),
('purchased'),
('transfered');

--
-- client
--
CREATE TABLE IF NOT EXISTS `client` (
  `id` int(11) NOT NULL auto_increment,
  `date_created` timestamp NOT NULL default CURRENT_TIMESTAMP,
  `client_status` varchar(26) NOT NULL,
  `firstname` varchar(26) NOT NULL,
  `lastname` varchar(46) NOT NULL,
  `address` varchar(78) NOT NULL,
  `city` varchar(56) NOT NULL,
  `state` varchar(2) NOT NULL,
  `zip` int(11) NOT NULL,
  `country` varchar(3) NOT NULL,
  `phone` text NOT NULL,
  `email` varchar(78) NOT NULL,
  `notes` text NOT NULL,
  PRIMARY KEY  (`id`),
  KEY `client_status` (`client_status`)
) ENGINE=InnoDB  DEFAULT CHARSET=latin1 AUTO_INCREMENT=4 ;

--
-- client_status
---
CREATE TABLE IF NOT EXISTS `client_status` (
  `name` varchar(26) NOT NULL,
  PRIMARY KEY  (`name`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;

INSERT INTO `client_status` (`name`) VALUES
('affiliate'),
('customer'),
('demo'),
('disabled'),
('reseller');

As you can see, 3 of the 4 tables use their 'name' as the primary key. I know that these will always be unique. In 2 of the cases (the *_status tables) I am basically using a dynamic replacement for ENUM, since status options could change in the future, and for the 'website' table, I know that the 'name' of the website will always be unique.

I'm wondering if this is sound logic, getting rid of table ID's when I know the name is always going to be a unique identifier, or a recipe for disaster? I'm not a seasoned DBA so any feedback, critique, etc. would be extremely helpful.

Thanks for taking the time to read this!

+6  A: 

When making natural PRIMARY KEY's, make sure their uniqueness is under your control.

If you're absolutely sure you will never ever have uniqueness violation, then it's OK to use these values as PRIMARY KEY's.

Since website_status and client_status seem to be generated and used by you and only by you, it's acceptable to use them as a PRIMARY KEY, though having a long key may impact performance.

website name seems be under control of the outer world, that's why I'd make it a plain field. What if they want to rename their website?

The counterexamples would be SSN and ZIP codes: it's not you who generates them and there is no guarantee that they won't be ever duplicated.

Quassnoi
I have heard of two people being assigned the same SSN. It isn't supposed to happen, but it has happened anyway, unless I'm misinformed. A bigger problem is people using "counterfeit" SSNs in order to obtain employment. Now your data source is corrupt, even if the Social Security Administration is managing SSNs correctly.
Walter Mitty
@Walter: my point exactly. SSN's should NOT be used as primary keys
Quassnoi
I love the point about the uniqueness being under your control! We've even had supposedly unique interger id fields from customers that turned out to not be unique, when they reused them or went to a new system.
HLGEM
+9  A: 

There are 2 reasons I would always add an ID number to a lookup / ENUM table:

  1. If you are referencing a single column table with the name then you may be better served by using a constraint
  2. What happens if you wanted to rename one of the client_status entries? e.g. if you wanted to change the name from 'affiliate' to 'affiliate user' you would need to update the client table which should not be necessary. The ID number serves as the reference and the name is the description.

In the website table, if you are confident that the name will be unique then it is fine to use as a primary key. Personally I would still assign a numeric ID as it reduces the space used in foreign key tables and I find it easier to manage.

EDIT: As stated above, you will run into problems if the website name is renamed. By making this the primary key you will be making it very difficult if not impossible for this to be changed at a later date.

Macros
Number two above is the best reason not to use the name as the primary key.
Nick DeVore
Good answer.Here is one more reason:String compare operation takes much longer time then numeric comparation.
Bogdan Gusiev
A: 

Here few points should be considered before deciding keys in table

  • Numeric key is more suitable when you use references ( foreign keys), since you not using foreign keys, it ok in your case to use non numeric key.

  • Non-numeric key uses more space than numeric keys, can decrease performance.

  • Numeric keys make db look simpler to understand ( you can easily know no of rows just by looking at last row)
Sharique
The value of the numeric key for the last row isn't a good indication of the total number of rows in the database, unless you NEVER delete a record in your database.
Jordan S. Jones
+2  A: 

Besides all the other excellent points that have already been made, I would add one more word of caution against using large fields as clustering keys in SQL Server (if you're not using SQL Server, then this probably doesn't apply to you).

I add this because in SQL Server, the primary key on a table by default also is the clustering key (you can change that, if you want to and know about it, but most of the cases, it's not done).

The clustering key that determines the physical ordering of the SQL Server table is also being added to every single non-clustered index on that table. If you have only a few hundred to a few thousand rows and one or two indices, that's not a big deal. But if you have really large tables with millions of rows, and potentially lots of indices to speed up the queries, this will indeed cause a lot of disk space and server memory to be wasted unnecessarily.

E.g. if your table has 10 million rows, 10 non-clustered indices, and your clustering key is 26 bytes instead of 4 (for an INT), then you're wasting 10 mio. by 10 by 22 bytes for a total of 2.2 billion bytes (or 2.2 GBytes approx.) - that's not peanuts anymore!

Again - this only applies to SQL Server, and only if you have really large tables with lots of non-clustered indices on them.

Marc

marc_s
"you can change that, if you want to and know about it, but most of the cases, it's not done" - Only if the person designing the database isn't a good database designer. An experienced and qualified designer will take into account many things before deciding on what should be used in the clustered index.
Tom H.
Tom H: yes, of course - but a majority of app developers are not at the same time great database designers, in my own personal experience. Lots of app developers treat the database as a "dumb storage dump" where they can just drop their objects into - and don't think enough about database design.
marc_s
+1  A: 

Personally, I think you will run into trouble using this idea. As you end up with more parent child relationships, you end up with a huge amount of work when the names change (As they always will sooner or later). There can be a big performance hit when having to update a child table that has thousands of rows when the name of the website changes. And you have to plan for how do make sure that those changes happen. Otherwise, the website name changes (oops we let the name expire and someone else bought it.) either break because of the foreign key constraint or you need to put in an automated way (cascade update) to propagate the change through the system. If you use cascading updates, then you can suddenly bring your system to a dead halt while a large chage is processed. This is not considered to be a good thing. It really is more effective and efficient to use ids for relationships and then put unique indexes on the name field to ensure they stay unique. Database design needs to consider maintenance of the data integrity and how that will affect performance.

Another thing to consider is that websitenames tend to be longer than a few characters. This means the performance difference between using an id field for joins and the name for joins could be quite significant. You have to think of these things at the design phase as it is too late to change to an ID when you have a production system with millions of records that is timing out and the fix is to completely restructure the databse and rewrite all of the SQL code. Not something you can fix in fifteen minutes to get the site working again.

HLGEM
+1  A: 

This just seems like a really bad idea. What if you need to change the value of the enum? The idea is to make it a relational database and not a set of flat files. At this point, why have the client_status table? Moreover, if you are using the data in an application, by using a type like a GUID or INT, you can validate the type and avoid bad data (in so far as validating the type). Thus, it is another of many lines to deter hacking.

Josh
+3  A: 

Kimberly Tripp has an Excellent series of blog articles (GUIDs as PRIMARY KEYs and/or the clustering key and The Clustered Index Debate Continues) on the issue of creating clustered indexes, and choosing the primary key (related issues, but not always exactly the same). Her recommendation is that a clustered index/primary key should be:

  1. Unique (otherwise useless as a key)
  2. Narrow (the key is used in all non-clustered indexes, and in foreign-key relationships)
  3. Static (you don't want to have to change all related records)
  4. Always Increasing (so new records always get added to the end of the table, and don't have to be inserted in the middle)

Using "Name" as your key, while it seems to satisfy #1, doesn't satisfy ANY of the other three.

Even for your "lookup" table, what if your boss decides to change all affiliates to partners instead? You'll have to modify all rows in the database that use this value.

From a performance perspective, I'm probably most concerned that a key be narrow. If your website name is actually a long URL, then that could really bloat the size of any non-clustered indexes, and all tables that use it as a foreign key.

BradC
A: 
cheduardo
A: 

"If you're absolutely sure you will never ever have uniqueness violation, then it's OK to use these values as PRIMARY KEY's."

If you're absolutely sure you will never ever have uniqueness violation, then don't bother to define the key.