views:

385

answers:

9

I have a table, which have an auto-incremented PK and creation_date field, which is the unix timestamp.
I am wondering why not lose the auto-incremented field and use the creation date field as the PK, as it is unique (I am using 1/1000 of a second accuracy).

For: I am killing an indexed row.
Against: there is a slight (very very slight) chance of a duplicate, but it is easy to handle this very rare event.

The DB is mysql.

+3  A: 

What would you gain by not having the auto-incremented PK?

Mark Biek
as stated in the question, I lose a field, I lose an index object.
Itay Moav
I'm sorry, but I don't see how having a PK based on creation-time helps you avoid that.
Mark Biek
@Mark Biek: presumably he needs a creation date key anyway?
ysth
+9  A: 

Because of the size (width) of the index. Timestamps are wide; unless your table contains a bunch of rows, you don't need bigint as the data type of the PK. The thinner the primary key column, the bigger the size of the index chunk you can keep in memory at once, and the faster your queries. So don't do it.

Alex
So, if my PK is long int loose it, if it is something smaller, keep it?
Itay Moav
I believe this answer should be combined with Stephen's
Itay Moav
If your PK is long int, evaluate whether you need it to be long int (are you expecting more than 2 million records?)Unix timestamp of right now is 1235887200, which is >32 bits - you need 64 bits to store it.
Alex
I'm really not sure about this answer... this is correct for most indexes, but clustered indexes are ordered on disk. A btree exists to determine which (disk) page to go to, and then once at the correct page it will do a scan. So you're talking `[index size] = log[row count]`. Not a huge concern
Stephen
e.g. I work for a company that has very large databases, and the (very competent) DBAs would prefer to "just use GUIDS (128bit) - they're universally unique (for most intents), don't require reading back from the database after insert, disks are cheap, and it doesn't affect the indexes"
Stephen
@Alex: Can you explain how UNIX timestamp is >32 bits? The UNIX timestamp 1235887200 is hex 49AA2460. Eight hex digits cannot be >32 bits (assuming it's stored as unsigned).
Bill Karwin
Also you're off by a few orders of magnitude on the range of a 32-bit int: −2,147,483,648 to +2,147,483,647. Not 2 million as you said.
Bill Karwin
Oops 2 BILLION, sorry. Me = stupid.
Alex
This answer is wrong. Timestamps should not be used to to the granularity of timing; how are you going to ensure uniqueness?? Talking about the size of the index is irrevelent if you don't have consistent data
Mitch Wheat
@Stephen - DBA's that are suggesting to use GUID's as primary keys are completely incompetent. If your index can fit into RAM, it will by definition be faster than seeking the pages on the disk. To fit a bigger section of the index in memory, you need to make the index smaller. To make it smaller, make individual entries smaller - i.e., don't use GUID's.
Alex
+12  A: 

The general answer is that your data may change (where a meaningless id never will)... what happens when you realise that you're storing time in the local zone and DST kicks in? If you want to store against UTC and/or against a specific time zone? For more ordering considerations see wcoenen's answer.

If you start creating 1000's of rows a second, and you're having to mess with data to "make it work" doing something it was not intended for. Perhaps you'd add a disambiguation column that would make your index bigger and slower ...

And then when your project becomes mega popular and people start trying to run reports/queries and "it's using a date as a PK???!!!"

Also consider using a database that allows clustered indexes on non-primary columns.

Stephen
I believe this answer should be combined with Alex's
Itay Moav
I think it should also be combined with wcoenen's answer. There appear to be many valid reasons not to use a timestamp as a PK.
RussellH
+2  A: 

Because of:

Against: there is a slight (very very slight) chance of a duplicate, but it is easy to handle this very rare event.

You don't have guarantee that your key will be always unique, so that information is not suitable for primary key.

What if you have to insert 10 or 100 records in batch? Would you insert pauses between inserts to be sure that you have unique primary key?

zendar
Well you do in that the insert will fail in which case you just insert it again.
cletus
In transaction? So on error, you first rollback everything that is already inserted, then repeat from beginning? Or maybe each record in separate transaction. So that inserts run as slow as it can be?
zendar
+8  A: 

Bad idea, because of "time zones".

If the country hosting your servers observes time changes associated with "Daylight Savings" plans, then once a year the time is going to get set back an hour.

Then for an hour, it will generate duplicate keys.

I worked for a company that had a database with a timestamp key like that, recording thousands of measurements per hour from equipment in a Semiconductor Manufacturing plant. It was developed in Korea (no daylight savings time shifts).

When they installed it here in the US ... we had to shut down the entire factory for an hour every year - in order not to lose the measurements taken during that hour. :-)

Ron

Ron Savage
I liked that Story :-) +1
Itay Moav
Isn't that why we have UTC? Even then I still agree with the "bad idea" part.
Tomek Szpakowicz
+1 Can't believe one would really shut down a factory rather than fixing the software though :-)
Wim Coenen
The dev team was in Korea, we weren't allowed to change it - when we told them about setting the time back an hour, they thought that was a stupid thing to do. :-)
Ron Savage
+2  A: 

Time is not granular enough, you may end up with insertion failures if two records are inserted at the same time.

Bryan Kyle
+1. I've had to fix a bug in a system that someone else wrote because of this. They were using Java's current sys ticks as a PK, but java's granularity isn't 1 tick. It was very frequent that more than 1 insert happened before the internal java ticks was incremented.
rally25rs
A: 

For: I am killing an indexed row.

Against: ... in favour of another indexed row that, due to its drastically greater length, will result in significant added overhead if used far more frequently that before.

ceejayoz
+5  A: 

It is common for PCs or servers to synchronize their clock with a time server. Because of this, you cannot rely on the system clock maintaining a steady pace forward. It may jump backwards or forwards slightly at any time.

Therefore, if you have to be able to reconstruct the order in which your records were created, you'll need a auto-incremented PK. You cannot rely on timestamps. This may sound very theoretical but it has already bitten us.

Wim Coenen
A: 

The basic answer is it doesn't scale. It may work now, but as computers get faster, and you get more users, sooner or later it will start clashing and limit the throughput of your system.

Then there's a whole lot of basic technical reasons as others have pointed out.