views:

607

answers:

9

We had a meeting this morning about how would should store our ID for some assets that we have in our database that we are making, the descusion generated a bit of heat so I decided to consult the experts of SO.

The table structure that I belive that we should have(short version) is like the following:

Example 1)

  • AssetId - int(32) - Primary Key
  • Type - string

so some example data is like this:

==AssetId======Type===
  12345        "Manhole"
  155415       "Pit"

etc.

Another member of the team suggested something like this:

Example 2)

  • AssetId - string - Primary Key
  • Type - string

so some example data is like this:

==AssetId======Type===
  "MH12345"    "Manhole"
  "P155415"    "Pit"

where we make a short version of the type and append it to the front of the ID and store it in the database. I have seen a few asset databases that do this and have never really this approach.

I have never really liked the idea of using strings as ID for sorting reasons. I also feel like it is storing useless information just for the sake of it when you already have the type of asset store anyway.

What approach would you take? And why? Are there any benefits to using approach 1 over 2?

EDIT: Yes I will be using AUTO_INCREMENT for approach 1.

+4  A: 

I'd go for the former. Creating unique IDs should be left to the SQL server, and you can't have those created automagically in a thread-safe manner if they're strings. To my understanding you'd have to handle that yourself somehow?

Speed is another factor. Dealing with int values is always going to be faster than strings. I'd say that there are other perf benefits around indexing that a much more SQL savvy person than me could elaborate on ;)

In my experience, having string IDs has been a fail.

OJ
+2  A: 

I personally believe the first approach is far, far better. It lets the database software do simple integer comparisons to find and sort by the key, which will improve table operation performance (SELECTs, complex JOINs, by-key INDEX lookups, etc.)

Of course, I'm assuming that either way, you're using some kind of auto-incrementing method to produce the IDs - either a sequence, an AUTO_INCREMENT, or something similar. Do me a favor, and don't build those in your program's code, OK?

scraimer
Yes I will be using AUTO_INCREMENT for approach 1. (added to post)
Nathan W
+6  A: 

This is a decision between surrogate and natural keys, the first being surrogate (or "technical") and the second being natural.

I've come to the conclusion that you should pretty much always use surrogate keys. If you use natural keys, those may change and updating primary/foreign keys is not generally a good idea.

cletus
This is an interesting point, but it does not answer the question, since both example 1 and 2 are surrogate keys. ;-)
hstoerr
A: 

If your assets already have unique natural identifiers (such as employees with their employee IDs), use them. There's no point creating another unique identifier.

On the other hand, if there's no natural unique ID, use the shortest one you can that'll ensure enough unique keys for your expected table size (such as your integer). It'll require less disk space and probably be faster. And, in addition, if you find yourself needing to use a string-based key later, it's a simple substitution job:

  • add sting primary key to asset table.
  • add string foreign key to referring tables.
  • update string relationships with simple UPDATE command using integer relationships.
  • add foreign key constraints for sting columns.
  • remove foreign key constraints for integer columns.
  • remove integer columns altogether.

Some of these steps may be problematic on specific DBMS', perhaps requiring a table unload/reload to delete the integer primary key columns but that strategy is basically what's required.

paxdiablo
+2  A: 

Well, I want to make some points and suggestions,

  • Consider having a separate table for Type, say with the column Id and Desc, then make a foreign key TypeId in this table. One step further in order to normalize the thing. But it may not desirable. Do it if you think it serve some purpose

  • Making it String does make sense, if later you folks think of shifting towards UUID. You don't need to change the data-type then

[Edited]

I agree with Cletus here. That surrogate key proved to be beneficial in some real life projects. They allow change, since change is the only constant.

Adeel Ansari
+18  A: 

Usually the rule of thumb is that never use meaningful information in primary keys (like Social Security number or barcode). Just plain autoincremented integer. However constant the data seems - it may change at one point (new legislation comes and all SSNs are recalculated).

Riho
Thanks to everybody who helped me to earn my badge :)Drinks are on me
Riho
Yes! In the past 3 companies I've worked for much pain has been caused because some idiot chose a "natural" key. UPCs get recycled; not everyone has an SSN; people screw up creating SKUs. You store that, you can UNIQUE it, but the PK is YOUR secret number for relationships. You don't expose it.
Nicholas Piasecki
I think this answer does not even answer the question. He was not proposing natural keys, but a surrogate key with a prefix that tells which table the key belongs to. ;-)
hstoerr
+1  A: 

I prefer Example 1 for the reasons you mentioned and the only argument that I can think of for using Example 2 is if you are trying to accomodate string IDs from an existing database (quite common) however even in that scenario, I prefer to use the following approach.

==AssetId(PK)==Type========DeprecatedId====
  12345        "Manhole"   "MH64247"
  155415       "Pit"       "P6487246"
Damien
A: 

The one and only advantage of example 2 is that you can easily tell just from the primary key alone which row of which table this key applies to. The idea is nice, but whether or not it is useful depends on your logging and errormessage strategies. It does probably have a performance disadvantage, so I would not use it unless you can name some specific reasons why to use it.

(You can have this advantage also by using a global sequence to generate numerical keys, or by using different numeric ranges, last digits or whatever. Then you don't have performance disadvantages, but maybe you won't find the table so easily.)

hstoerr
+1  A: 

I would choose a numeric primary key for performance reasons. Integer comparisons are much cheaper than string comparisons, and it will occupy less space in the DB.

Ariel