views:

64

answers:

2

SQl Server 2005:

Option: 1

    CREATE TABLE #test
      (customerid, orderdate, field1 INT, field2 INT, field3 INT)

    CREATE UNIQUE CLUSTERED INDEX Idx1 ON #test(customerid)
    CREATE INDEX Idx2 ON #test(field1 DESC)
    CREATE INDEX Idx3 ON #test(field2 DESC)
    CREATE INDEX Idx4 ON #test(field3 DESC)

    INSERT INTO #test
      (customerid, orderdate, field1 INT, field2 INT, field3 INT)
    SELECT
      customerid, orderdate, field1, field2, field3 FROM 
    ATABLERETURNING4000000ROWS

compared to

Option: 2

    CREATE TABLE #test
      (customerid, orderdate, field1 INT, field2 INT, field3 INT)

    INSERT INTO #test
      (customerid, orderdate, field1 INT, field2 INT, field3 INT)
    SELECT
      customerid, orderdate, field1, field2, field3 FROM 
    ATABLERETURNING4000000ROWS

    CREATE UNIQUE CLUSTERED INDEX Idx1 ON #test(customerid)
    CREATE INDEX Idx2 ON #test(field1 DESC)
    CREATE INDEX Idx3 ON #test(field2 DESC)
    CREATE INDEX Idx4 ON #test(field3 DESC)

When we use the second option it runs close to 50% faster. Why is this?

+1  A: 

Because you are inserting the rows before you add the indexes. The unique index requires that the system perform uniqueness checks on the newly added rows and on insert, the system must update the various index entries. Not having to do the work of uniqueness checks is faster but your index creation will fail in the second option if there are duplicate customerId values.

Thomas
Thanks. Actually we have a group by customerid on the ATABLERETURNING4000000ROWS table which would ensure uniqueness.
stackoverflow
@stackoverflow: even if you remove the unique constraint, the second option will be faster. I have added an answer as to why.
andras
+1  A: 

From the SQL Server Query Processing Team:

In order to build the b-tree for the index we have to first sort the data from source. The flow is to scan the source, sort it (if possible - in memory*), then build the b-tree from the sort.
Why do we need to sort first before building the b-tree? In theory we don’t have to sort, we could use regular DML and directly insert data into the in-build index (no sort), but in this case we would be doing random inserts, random inserts in a b-tree require searching the b-tree for the correct leaf node first and then inserting the data. And while searching a b-tree is fairly fast, doing so before each insert is far from optimal.

Your indexes are B+ trees.

The first query requires lookups in B+ trees for each record and then modifying the B+ trees.

The second query will sort the data reuired for each index in turn according to the particular index and the B+ trees are constructed very efficiently.

andras