+8  A: 

I would personally say keep it simple and stay with an autoincremented primary key. If you need something more "Readable" in terms of display in the program, then possibly one of your other ideas, but I think that is just adding unneeded complexity to the primary key field.

TheTXI
+1 Let the keys be JUST keys. Then you're never worried about changing types, adding columns, or anything else. Use index on other columns if you need to be able to look up their values quickly.
Jason Cohen
I agree. Complicating things is just that, complicating things.
TheTXI
+1  A: 

Follow the Don't Use policy.

Some problems you can run into:

You need to generate keys from more than one host.

Someone will want to reserve contiguous numbers to use together.

How meaningful will people want it to be? Wars are fought over this, and you're in the first skirmish of one already. "It's already meaningful, and if we just add two more digits we can ..." i.e. you're establishing a design style that will (should) be extensible.

If you are concatenating the two, you're doing typecasts which can mess up your query Optimizer.

You'll need to reclassify roads, and redefine their boundaries (i.e. move the roads), which implies changing the primary key and maybe losing links.

There are workarounds for all this, but this is the kind of issue where workarounds proliferate and get out of control. And it doesn't take more than a couple to get beyond "Simple".

le dorfier
A: 

Another thing to keep in mind is that if you're importing alot of data into this system, you may find out that things like Road_Number are not as unique as you thought, and there may be operational roadblocks to fixing the problem (repainting road signs, etc.) .

MadCoder
+7  A: 

I'm also very strongly in the "don't use primary keys as meaningful data" camp. Every time I have contravened that policy it has ended in tears. Sooner or later the meaningful data needs to change and if that means you have to change a primary key it can get painful. The primary key will probably be used in foreign key constraints and you can spend ages trying to sort it all out just to make a simple data change.

I always use GUIDs/UUIDs for my primary keys in every table I ever create but that's just personal preference serials or such are also good.

sipwiz
+4  A: 

Don't put meaning into your PK fields unless...

  • It is 100% completely impossible that the value will never change and that

  • No two people would ever reasonably
    argue about which value should be
    used for a particular row.

Go with option one and format the value in the app to look like option two or three when it is displayed.

JohnFx
Can you design a database where a value cannot get changed by anyone?
Jeff O
That's kind of a strange question. I think you misinterpreted my answer. I was saying that you'd want to pick a PK that no one would WANT to change, not necessarily that it was physically impossible to change it.
JohnFx
+51  A: 

This is really a discussion about surrogate (also called technical or synthetic) vs natural primary keys, a subject that has been extensively covered. I covered this in Database Development Mistakes Made by AppDevelopers.

Natural keys are keys based on externally meaningful data that is (ostensibly) unique. Common examples are product codes, two-letter state codes (US), social security numbers and so on. Surrogate or technical primary keys are those that have absolutely no meaning outside the system. They are invented purely for identifying the entity and are typically auto-incrementing fields (SQL Server, MySQL, others) or sequences (most notably Oracle).

In my opinion you should always use surrogate keys. This issue has come up in these questions:

Auto number fields are the way to go. If your keys have meaning outside your database (like asset numbers) those will quite possibly change and changing keys is problematic. Just use indexes for those things into the relevant tables.

cletus
I can't agree enough with this. I've been burned so many times by project managers who swore, swore, SWORE the user-generated surrogate keys would be unique, but then later realized there were some obscure cases where the numbers were duplicated. Very painful to fix later.
Brent Ozar
Yes. There are cases when a natural key is a good choice. But the OP's scenario is not one of these cases. We should assume that some road identifiers will change, split, merge, be renamed, whatever.
Bill Karwin
I agree completely. I'd also avoid ever letting the users see your surrogate primary key. If you do, eventually they WILL assign some meaning to it, and decide they want to change something, and you're back here again.
kenj0418
Let me translate @cletus' post for you: "Use Example 3".:)
Randolpho
Nathan, I don't think you will get an better answer than this. I for myself see this question answered and read no further.
Hinek
+1  A: 

As mentioned before, keep your internal primary keys as just keys, whatever the most optimal datatype is on your platform.

However you do need to let the numbering system argument be fought out, as this is actually a business requirement, and perhaps let's call it an identification system for the asset.

If there is only going to be one identifier, then add it as a column to the main table. If there are likely to be many identification systems (and assets usually have many), you'll need two more tables

    Identifier-type table             Identifier-cross-ref table
      type-id             ------------> type-id              (unique
      type-name                         identifier-string     key)
                                        internal-id


That way different people who need to access the asset can identify in their own way. For example the server team will identify a server differently from the network team and different again from project management, accounts, etc.

Plus, you get to go to all the meetings where everyone argues with each other.

wentbackward
A: 

While natural keys may have great meaning to the business users, if you do not have the agreement that those keys are sacred and should not be altered, you will more than likely be pulling your hair out while maintaining a database where the "product codes have to be changed to accommodate the new product line the company acquired." You need to protect the RI of your data, and integers as primary keys with auto-increment are the best way to go. Performance is also better when indexing and traversing integers than char columns.

While not appropriate as primary keys, natural keys are very appropriate for user consumption and you can enforce uniques via an index. They bring a context to the data that will make it easier for all parties to understand. Also, in the advent that you need to reload data, the natural keys can help verify that your lookups are still valid.

David Robbins
A: 

I would go with the surrogate key, but you may want to have a computed column that "formats" the surrogate key into a more "readable" value if that improves your reporting. The computed colum could produce example 2 from the surrogate key for instance for display purposes.

I think the surrogate key route is the way to go and the only exceptions that I make for it are join tables, where the primary key could be composed of the foreign key references. Even in these cases I'm finding that having a surrogate primary key is more useful than not.

tvanfosson
I agree, but wouldn't it be possible to simply concatenate some columns in a query to produce a "readable" key in a report?
Damien
Yes, but that's exactly what a computed column is. If you are always accessing it programatically there is probably little difference, but any hand queries will get the value consistently and not have to worry about getting the definition wrong.
tvanfosson
+2  A: 

I think the important thing to remember here is that each table in your database/design might have multiple keys. These are the Candidate Keys. See wikipedia entry for Candidate Keys

By definition, all Candidate Keys are created equal. They are each unique identifiers for the table in question.

Your job then is to select the best candidate from the pool of Candidate Keys to serve as the Primary Key. The Primary Key will be used by other tables to establish the relational constraints, but you are free to continue using Candidate Keys to query the table.

Because Primary Keys are referenced by other structures, and therefore used in join operations, the criteria for Primary Key selection boils down to the following for me (in order of importance):

  • Immutable/Stable - Primary Key values should not change. If they do, you run the risk of introducing update anomolies
  • Not Null - most DBMS platforms require that the Primary Key attribute(s) are not null
  • Simple - simple datatypes and values for physical storage and performance. Integer values work well here, and this is the datatype of choice for most surrogate/auto-gen keys

Once you've identified the Candidate Keys, the criteria above can be used to select the Primary Key. If there is not a "Natural" Candidate Key meets the criteria, then a Surrogate Key that does meet the criteria can be created and used as mentioned in other answers.

jwolly2
A: 

I suspect that you really should use option #3, as many here have already said. Surrogate PKs (either Integers or GUIDs) are good practice, even if there are adequate business keys. Surrogates will reduce maintenance headaches (as you yourself have already noted).

That being said, something you may want to consider is whether or not your database is:

  1. focused on data maintenance and transactional processing (i.e. Create/Update/Delete operations)
  2. geared towards analysis and reporting (i.e. Queries)

In other words, are the users concerned with maintaining active data or querying largely static data to find answers?

If you are heavily focused on building an analysis and reporting DB (e.g. a data warehouse/mart) that is exposed to technical business users (e.g. report designers) who have a good grasp of the business vocabulary, then you might want to consider using natural keys based on meaningful business values. They help reduce query complexity by eliminating the need for complex joins and help the user focus on their task, not fighting the database structure.

Otherwise you're probably focused on a full CRUD DB that has to cover all the bases to some degree - this is the vast majority of situations. In which case, go with your option #3. You can always optimize for queryability in the future but you'll be hard pressed to retrofit for maintainability.

c.batt
A: 

I hope you will agree with me that every design element should have single purpose.

Question is what do you think is purpose of PK? If it is to identify unique record in a table, then surrogate keys wins without much trouble. This is simple and straight.

As far as new columns in option 3 are concerned, you should check if these can be calculated (best would be to do calculation in model layer so that they can be changed easily than if calculation done in RDBMS) without too much of performance penalty from other elements. For example, you can store segment number and road number in corresponding tables and then use them to generate "00000001.1". This will allow to change asset numbering on-the-fly.

isntn
A: 

First off, option 2 is the absolute worst option. As an Index, it's a string, and that makes it slow. And it's generated based on business rules - which can change and cause a rather large headache.

Personally, I always use a separate primary key column; and I always use a GUID. Some developers prefer a simple INT over a GUID for reasons of hard-drive space. However, if the situation arises where you need to merge two databases, GUIDs will almost never collide (whereas INTs are guaranteed to collide).

Primary Keys should NEVER be seen by the user. Making it readable to the user should not be a concern. Primary Keys SHOULD be used to link with Foreign Keys. This is their purpose. The value should be machine readable and, once created, never changed.

harley.333