tags:

views:

112

answers:

7

I'm working with a client who wants to add timestamps to a bunch of tables so that they may sort the records in those tables chronologically. All of the tables also have an auto incrementing integer field as their primary key (id).

The (simple) idea - save the overhead/storage and rely on the primary key to sort fields chronologically. Sure this works, but I'm uncertain whether or not this approach is acceptable in sound database design.

Pros: less storage required per record, simpler VO classes, etc. etc.

Con: it implies a characteristic of that field, an otherwise simple identifer, whose definition does not in any way define or guarantee that it should/will function as such.

Assume for the sake of my question that the DB table definitions are set in stone. Still - is this acceptable in terms of best practices?

Thanks

+2  A: 

You can acheive the same goal in the short term by sorting on the ID column. This would be better that adding additional data to acheive the same result. I don't think that it would be confusing for anyone to look at the data table and know that it's chronological when they see that it's an identity column.

There are a few drawbacks or limitations that I see however.

  • The chronological sort can be messed up if someone re-seeds the column
  • Chronology for a date period cannot be ascertained without the additional data
  • This setup prevents you from sorting chronologically if the system ever accepts new, non-chronological data

Based on the realistic evaluation of these "limitations" you should be able to advise a proper approach.

Brad
Trust me some wiseguy will come along and say he wants to change the Identity column to something meaningful or restart the values from a different starting point for some silly reason. Use a DateTime
Roadie57
@Roadie57, I thought the diabolical nature of "the User" went without saying :) I do agree with you, however, the OP did say to assume the database structure was "set in stone". Not to mention he's working for a *client*, not an employer who might be willing to pay for such "frivolous" upgrades.
Brad
@Brad I was letting real life experience get in the way of my thinking. I actually have had someone come along and say that all auto increment values had to be reseeded to start at 1008000 so that some silly silly report could be run without filtering by anything but id columns
Roadie57
+1  A: 

Auto-incrementing ID will give you an idea of order as Brad points out, but do it right - if you want to know WHEN something was added, have a datetime column. Then you can not only chronologically sort but also apply filters.

n8wrl
+4  A: 

You asked for "best practices", rather than "not terrible practices" so: no, you should not rely on an autoincremented primary key to establish chronology. One day you're going to introduce a change to the db design and that will break. I've seen it happen.

A datetime column whose default value is GETDATE() has very little overhead (about as much as an integer) and (better still) tells you not just sequence but actual date and time, which often turns out to be priceless. Even maintaining an index on the column is relatively cheap.

These days, I always put a CreateDate column data objects connected to real world events (such as account creation).

Edited to add:

If exact chronology is crucial to your application, you can't rely on either auto-increment or timestamps (since there can always be identical timestamps, no matter how high the resolution). You'll probably have to make something application-specific instead.

egrunin
But if you use SQL Server 200x `DATETIME` data type, you might end up with several rows that have the same identical datetime value (since it's "resolution" is 3.33ms) and you cannot tell chronological order from that anymore, either.....
marc_s
@marc_s: true; actually you can *always* have simultaneous timestamps (unless your timer ticks faster than your database). If sequence is that crucial you have to build it in at a deeper level.
egrunin
@marc_s Use datetime2 on your column.
Gabriel Guimarães
@Gabriel Guimarães: even with Datetime2, you're not 100% sure that you don't get two entries with the same value...... datetime and datetime2 just aren't safe....
marc_s
@marc_s datetime precision is 100 nanoseconds. You would only be able to get two events on the same range, if your clock speed is greater than 10 mghz.
Gabriel Guimarães
@Gabriel Guimarães: on SQL Server, you could have multiple requests come in from several client apps at the same time - no need for the clock speed to be this high.....
marc_s
+2  A: 

Further to egrunin's answer, a change to the persistence or processing logic of these rows may cause rows to be inserted into the database in a non-sequential or nondeterministic manner. You may implement a parallelized file processor that throws a row into the DB as soon as the thread finishes transforming it, which may be before another thread has finished processing a row that occurred earlier in the file. Using an ORM for record persistence may result in a similar behavior; the ORM may just maintain a "bag" (unordered collection) of object graphs awaiting persistence, and grab them at random to persist them to the DB when it's told to "flush" its object buffer.

In either case, trusting the autoincrement column to tell you the order in which records came into the SYSTEM is bad juju. It may or may not be able to tell you the order in which records his the DATABASE; that depends on the DB implementation.

KeithS
A: 

Don't do it. You should never rely on the actual value of your ID column. Treat it like a black box, only useful for doing key lookups.

You say "less storage required per record," but how important is that? How big are the rows we're talking about? If you've got 200-byte rows, another 4 bytes probably isn't going to matter much.

Don't optimize without measuring. Get it working right first, and THEN optimize.

Andy Lester
A: 

@MadBreaker

There's to separate things, if you need to know the order you create a column order with autoincrement, however if you want to know the date and time it was inserted you use datetime2.

Chronological order can be garanteed if you don't allow updates or deletes, but if you want time control over select you should use datetime2.

Gabriel Guimarães
A: 

You didnt mention if you are running on a single db or clustered. If you are clustered, be wary of increment implementations, as you are not always guaranteed things will come out in the order you would naturally think. For example, Oracle sequences can cache groups of next values (depending on your setup) and give you a 1,3,2,4,5 sort of list...

chrismh