views:

475

answers:

3

begin transaction;
create table person_id(person_id integer primary key);
insert into person_id values(1);
... snip ...
insert into person_id values(50000);
commit;

This code takes about 0.9 seconds on my machine and creates a db file taking up 392K. These numbers become 1.4 seconds and 864K if I change the second line to

create table person_id(person_id integer nonclustered primary key);

Why is this the case?

A: 

[Only as an idea]

Maybe when you specify explicitly to take integer columns as a clustered key, it does just that. But when you tell it not to use your integer column, it still creates an index behind the scenes but chooses a different datatype for doing that, suppose, twice as large. Then each of those entries have to reference the records in the table and here you go, the size is exploding.

Developer Art
+1  A: 

Clustering the primary key stores it with the rows; this means that it takes up less space (as there are no separate index blocks). Typically its main benefit however, is that range scans can generally access rows which are in the same block, reducing IO operations, which becomes rather important when you have a large data set (not 50k ints).

I think 50k ints is a rather artificial benchmark and not one you care about in the real world.

MarkR
If I didn't plan on doing joins, nor range scans and only cared about insert performance - would there be any better way to create the table than the first examples?
Elite Mx
If you only cared about insert performance, you should use no indexes at all (if supported), or write the data into a text file. Appending to text files is pretty quick.
MarkR
A: 

I randomized the insert statements, and re-did the query with values from one to half a million. Interestingly, both the clustered and nonclustered db files now take up the exact amount of space (down to the byte). However the inserts on the clustered db are still faster.

To me this is counter intuitive. When I tell the database cluster these values - I'm telling the database ... these values better be in this order when I come back to get them. When I don't have the specification, I'm essentially saying to the db - look take these values and arrange them however you like - whatever makes your life easier.

Theoretically, this extra freedom should never slow down the queries. Maybe not speed them up all the time, but never slow them down. Thoughts?

Elite Mx