Adding a column efficently in SQL Server

views:

448

answers:

+1 Q:

Adding a column efficently in SQL Server

I want to add an integer column to a table with a large number of rows and many indexes (Its a data warehouse Fact Table).

To keep the row width as narrow as possible all the columns in this table are defined as not null. So I want the new column to be not null with a default of zero.

From experience adding this column will take some time, presumably because the database will need to rewrite all the rows with a new column with a filled value. And this presumably will involve updating the clustered index and all the non-clustered indexes.

So should I drop all the indexes before adding the column and then recreate them all. Or is there an easier way to do this?

Also I don't really understand why adding a column that is nullable is so much quicker. Why does this not involve re-writng the records with an extra Is Null bit flipped for each row.

SQL Server is a row oriented database. This is in contrast to a column oriented database. This means that in SQL Server, all of the data for a given row is stored together on the disk. Let's have an example:

Say you have a Customer table with 3 columns, FirstName, MiddleInitial, and LastName. Then, say you have 3 records in this table for Jabba T. Hutt, Dennis T. Menace, and George W. Bush.

In a row oriented database (like SQL Server), the records will be stored on disk as such:

Jabba, T, Hutt; Dennis, T, Menace; George, W, Bush;

In contrast, a column oriented database would store the records on disk like this:

Jabba, Dennis, George; T, T, W; Hutt Menace, Bush;

Where columns are grouped together instead of rows.

Now, when you go to add a column to a table in a row oriented database (SQL Server, for example), the new data for each column has to be inserted alongside the existing rows, shifting the rows requiring a lot of read/write operations. So, if you were to insert a new column for the customer prefix that defaults to 'Mr', this is what you'd get:

Mr, Jabba, T, Hutt; Mr, Dennis, T, Menace; Mr, George, W, Bush;

As you can see, all of the original data has been shifted to the right. On the other hand, when you insert a new column that defaults to NULL, no new data has to be put into the existing rows. Thus, there is less shifting, requiring fewer disk read/write operations.

Of course, this an oversimplification of what's actually going on on disk. There are other things to take into account when dealing with indexes, pages, etc. But, it should help you get the picture.

For clarification I'm not at all suggesting you move to a column oriented database, I just put that info in there to help explain what Row oriented meant.

dustyburwell 2009-04-16 15:17:49

+4 A:

It will require updating the clustered index, yes - this IS the table data, after all.

But I don't see why any of the non-clustered indices would have to updated - your new column won't be member of any of the non-clustered indices.

Also, I don't see how dropping and recreating the indices would benefit you in this scenario. If you were bulk-loading several million existing rows from another table or database - yes, then it might be faster (due to the INSERTs being much faster) - but adding a column doesn't really suffer from any indices or constraints being around, I don't think.

Marc

marc_s 2009-04-16 15:23:26

"But I don't see why any of the non-clustered indices would have to updated" - adding a not null column requires the dropping and recreation of the table so as a by-product all the indexes have to be dropped and recreated as well.

DJ 2009-04-16 18:45:48

"Also I don't really understand why adding a column that is nullable is so much quicker. Why does this not involve re-writng the records with an extra Is Null bit flipped for each row."

Adding a nullable column merely changes the definition of the table. The individual records are not affected.

Rob Garrison 2009-04-16 18:38:08

ansaurus

tags:

views:

answers:

Adding a column efficently in SQL Server

related questions