Why not use the creation time of a record as a primary key?

views:

385

answers:

+3 Q:

Why not use the creation time of a record as a primary key?

I have a table, which have an auto-incremented PK and creation_date field, which is the unix timestamp.
I am wondering why not lose the auto-incremented field and use the creation date field as the PK, as it is unique (I am using 1/1000 of a second accuracy).

For: I am killing an indexed row.
Against: there is a slight (very very slight) chance of a duplicate, but it is easy to handle this very rare event.

The DB is mysql.

+3 A:

What would you gain by not having the auto-incremented PK?

Mark Biek 2009-03-01 22:43:52

as stated in the question, I lose a field, I lose an index object.

Itay Moav 2009-03-01 22:45:56

I'm sorry, but I don't see how having a PK based on creation-time helps you avoid that.

Mark Biek 2009-03-01 23:07:28

@Mark Biek: presumably he needs a creation date key anyway?

ysth 2009-03-01 23:23:11

+9 A:

Because of the size (width) of the index. Timestamps are wide; unless your table contains a bunch of rows, you don't need bigint as the data type of the PK. The thinner the primary key column, the bigger the size of the index chunk you can keep in memory at once, and the faster your queries. So don't do it.

Alex 2009-03-01 22:51:21

So, if my PK is long int loose it, if it is something smaller, keep it?

Itay Moav 2009-03-01 22:54:09

I believe this answer should be combined with Stephen's

Itay Moav 2009-03-01 22:56:50

If your PK is long int, evaluate whether you need it to be long int (are you expecting more than 2 million records?)Unix timestamp of right now is 1235887200, which is >32 bits - you need 64 bits to store it.

Alex 2009-03-01 22:58:46

I'm really not sure about this answer... this is correct for most indexes, but clustered indexes are ordered on disk. A btree exists to determine which (disk) page to go to, and then once at the correct page it will do a scan. So you're talking `[index size] = log[row count]`. Not a huge concern

Stephen 2009-03-01 23:07:48

e.g. I work for a company that has very large databases, and the (very competent) DBAs would prefer to "just use GUIDS (128bit) - they're universally unique (for most intents), don't require reading back from the database after insert, disks are cheap, and it doesn't affect the indexes"

Stephen 2009-03-01 23:11:13

@Alex: Can you explain how UNIX timestamp is >32 bits? The UNIX timestamp 1235887200 is hex 49AA2460. Eight hex digits cannot be >32 bits (assuming it's stored as unsigned).

Bill Karwin 2009-03-01 23:24:31

Also you're off by a few orders of magnitude on the range of a 32-bit int: −2,147,483,648 to +2,147,483,647. Not 2 million as you said.

Bill Karwin 2009-03-01 23:27:02

Oops 2 BILLION, sorry. Me = stupid.

Alex 2009-03-01 23:31:46

This answer is wrong. Timestamps should not be used to to the granularity of timing; how are you going to ensure uniqueness?? Talking about the size of the index is irrevelent if you don't have consistent data

Mitch Wheat 2009-03-02 00:48:25

@Stephen - DBA's that are suggesting to use GUID's as primary keys are completely incompetent. If your index can fit into RAM, it will by definition be faster than seeking the pages on the disk. To fit a bigger section of the index in memory, you need to make the index smaller. To make it smaller, make individual entries smaller - i.e., don't use GUID's.

Alex 2009-09-23 22:44:01

+12 A:

The general answer is that your data may change (where a meaningless id never will)... what happens when you realise that you're storing time in the local zone and DST kicks in? If you want to store against UTC and/or against a specific time zone? For more ordering considerations see wcoenen's answer.

If you start creating 1000's of rows a second, and you're having to mess with data to "make it work" doing something it was not intended for. Perhaps you'd add a disambiguation column that would make your index bigger and slower ...

And then when your project becomes mega popular and people start trying to run reports/queries and "it's using a date as a PK???!!!"

Also consider using a database that allows clustered indexes on non-primary columns.

Stephen 2009-03-01 22:51:41

I believe this answer should be combined with Alex's

Itay Moav 2009-03-01 22:57:21

I think it should also be combined with wcoenen's answer. There appear to be many valid reasons not to use a timestamp as a PK.

RussellH 2009-03-02 20:20:15

+2 A:

Because of:

Against: there is a slight (very very slight) chance of a duplicate, but it is easy to handle this very rare event.

You don't have guarantee that your key will be always unique, so that information is not suitable for primary key.

What if you have to insert 10 or 100 records in batch? Would you insert pauses between inserts to be sure that you have unique primary key?

zendar 2009-03-01 22:52:09

Well you do in that the insert will fail in which case you just insert it again.

cletus 2009-03-01 23:56:15

In transaction? So on error, you first rollback everything that is already inserted, then repeat from beginning? Or maybe each record in separate transaction. So that inserts run as slow as it can be?

zendar 2009-03-02 01:15:39

+8 A:

Bad idea, because of "time zones".

If the country hosting your servers observes time changes associated with "Daylight Savings" plans, then once a year the time is going to get set back an hour.

Then for an hour, it will generate duplicate keys.

I worked for a company that had a database with a timestamp key like that, recording thousands of measurements per hour from equipment in a Semiconductor Manufacturing plant. It was developed in Korea (no daylight savings time shifts).

When they installed it here in the US ... we had to shut down the entire factory for an hour every year - in order not to lose the measurements taken during that hour. :-)

Ron

Ron Savage 2009-03-01 22:52:36

I liked that Story :-) +1

Itay Moav 2009-03-01 22:58:10

Isn't that why we have UTC? Even then I still agree with the "bad idea" part.

Tomek Szpakowicz 2009-03-01 23:09:07

+1 Can't believe one would really shut down a factory rather than fixing the software though :-)

Wim Coenen 2009-03-02 00:49:10

The dev team was in Korea, we weren't allowed to change it - when we told them about setting the time back an hour, they thought that was a stupid thing to do. :-)

Ron Savage 2009-03-02 01:15:23

+2 A:

Time is not granular enough, you may end up with insertion failures if two records are inserted at the same time.

Bryan Kyle 2009-03-01 22:55:19

+1. I've had to fix a bug in a system that someone else wrote because of this. They were using Java's current sys ticks as a PK, but java's granularity isn't 1 tick. It was very frequent that more than 1 insert happened before the internal java ticks was incremented.

rally25rs 2009-03-02 00:52:50

For: I am killing an indexed row.

Against: ... in favour of another indexed row that, due to its drastically greater length, will result in significant added overhead if used far more frequently that before.

ceejayoz 2009-03-01 23:13:24

+5 A:

It is common for PCs or servers to synchronize their clock with a time server. Because of this, you cannot rely on the system clock maintaining a steady pace forward. It may jump backwards or forwards slightly at any time.

Therefore, if you have to be able to reconstruct the order in which your records were created, you'll need a auto-incremented PK. You cannot rely on timestamps. This may sound very theoretical but it has already bitten us.

Wim Coenen 2009-03-02 00:44:30

The basic answer is it doesn't scale. It may work now, but as computers get faster, and you get more users, sooner or later it will start clashing and limit the throughput of your system.

Then there's a whole lot of basic technical reasons as others have pointed out.

2009-03-02 19:58:45

ansaurus

tags:

views:

answers:

Why not use the creation time of a record as a primary key?

related questions