ansaurus

Question

How do I add a column to large sql server table

Answer 1

+6 A:

ALTER TABLE table1 ADD
  newcolumn int NULL
GO

should not take that long... What takes a long time is to insert columns in the middle of other columns... b/c then the engine needs to create a new table and copy the data to the new table.

Nestor 2009-10-29 17:15:00

You know what, you're right. As long as the column has a value of NULL, then it gets added pretty fast. But if I get a default value, its take a long long time. So the real issue I need to plan for is how to add a default value to the column.

Jonathan Beerhalter 2009-10-29 17:33:07

Add the column and then perform relatively small UPDATE batches to populate the column with a default value. That should prevent any noticeable slowdowns.

Agent_9191 2009-10-29 17:36:55

Thanks Agent_9191, that seems like a pretty decent approach.

Jonathan Beerhalter 2009-10-29 17:39:22

Maybe this is nitpicking, but I believe the that "b/c then the engine needs to create a new table and copy the data to the new table" is not entirely accurate. AFAIK, the database engine does not allow "inserting" columns, only adding of columns to the end of the table definition. The client tools are responsible for creating a temp table, copying etc.

Kim Major 2009-10-29 18:09:30

Kim, yes.. i didnt mean to say that the engine does it (even though that's what I wrote :-) ). It's the client tool.

Nestor 2009-10-29 18:12:35

Answer 2

+3 A:

The only real solution for continuous uptime is redundancy.

I acknowledge @Nestor's answer that adding a new column shouldn't take long in SQL Server, but nevertheless, it could still be an outage that is not acceptable on a production system. An alternative is to make the change in a parallel system, and then once the operation is complete, swap the new for the old.

For example, if you need to add a column, you may create a copy of the table, then add the column to that copy, and then use sp_rename() to move the old table aside and the new table into place.

If you have referential integrity constraints pointing to this table, this can make the swap even more tricky. You probably have to drop the constraints briefly as you swap the tables.

For some kinds of complex upgrades, you could completely duplicate the database on a separate server host. Once that's ready, just swap the DNS entries for the two servers and voilà!

I supported a stock exchange company in the 1990's who ran three duplicate database servers at all times. That way they could implement upgrades on one server, while retaining one production server and one failover server. Their operations had a standard procedure of rotating the three machines through production, failover, and maintenance roles every day. When they needed to upgrade hardware, software, or alter the database schema, it took three days to propagate the change through their servers, but they could do it with no interruption in service. All thanks to redundancy.

Bill Karwin 2009-10-29 17:49:06

How did you catch up on missed transactions during maintenance? Standard replication?

Eric J. 2009-10-29 18:20:09

A stock exchange doesn't need to operate 24/7. They close at the bell.

Bill Karwin 2009-10-29 18:42:20

Doh :-) Thoughts on how to handle that for 24/7 systems?

Eric J. 2009-10-29 19:12:23

Right; this has to be solved by synchronizing the delta of data. Replication and logging are common solutions. That's a pretty broad subject though. My general point is that high-availability and redundancy go hand in hand.

Bill Karwin 2009-10-29 19:33:03

Answer 3

+3 A:

"Add the column and then perform relatively small UPDATE batches to populate the column with a default value. That should prevent any noticeable slowdowns"

And after that you have to set the column to NOT NULL which will fire off in one big transaction. So everything will run really fast until you do that so you have probably gained very little really. I only know this from first hand experience.

You might want to rename the current table from X to Y. You can do this with this command sp_RENAME '[OldTableName]' , '[NewTableName]'.

Recreate the new table as X with the new column set to NOT NULL and then batch insert from Y to X and include a default value either in your insert for the new column or placing a default value on the new column when you recreate table X.

I have done this type of change on a table with hundreds of millions of rows. It still took over an hour, but it didn't blow out our trans log. When I tried to just change the column to NOT NULL with all the data in the table it took over 20 hours before I killed the process.

Have you tested just adding a column filling it with data and setting the column to NOT NULL?

So in the end I don't think there's a magic bullet.

StarShip3000 2009-10-29 17:52:11

Answer 4

A:

select into a new table and rename. Example, Adding column i to table A:

select *, 1 as i
into A_tmp
from A_tbl

//Add any indexes here

exec sp_rename 'A_tbl', 'A_old'
exec sp_rename 'A_old', 'A_tbl'

Should be fast and won't touch your transaction log like inserting in batches might. (I just did this today w/ a 70 million row table in < 2 min).

You can wrap it in a transaction if you need it to be an online operation (something might change in the table between the select into and the renames).

ElHombre 2009-10-29 18:09:56

Answer 5

A:

Another technique is to add the column to a new related table (Assume a one-to-one relationship which you can enforce by giving the FK a unique index). You can then populate this in batches and then you can add the join to this table wherever you want the data to appear. Note I would only consider this for a column that I would not want to use in every query on the original table or if the record width of my original table was getting too large or if I was adding several columns.

HLGEM 2009-10-29 18:46:44

ansaurus

tags:

views:

answers:

How do I add a column to large sql server table

related questions