views:

2167

answers:

5

I have a few huge tables on a production SQL 2005 DB that need a schema update. This is mostly an addition of columns with default values, and some column type change that require some simple transformation. The whole thing can be done with a simple "SELECT INTO" where the target is a table with the new schema.

Our tests so far show that even this simple operation, done entirely inside the server (Not fetching or pushing any data), could take hours if not days, on a table with many millions of rows.

Is there a better update strategy for such tables?

edit 1: We are still experimenting with no definitive conclusion. What happens if one of my transformations to a new table, involve merging every five lines to one. There is some code that has to run on every transformation. The best performance we could get on this got us at a speed that will take at least a few days to convert a 30M rows table

Will using SQLCLR in this case (doing the transformation with code running inside the server) give me a major speed boost?

+3  A: 

Are you applying indexes immediately, or in a secondary step? Should go much faster without indexing during the build.

Brian Knoblauch
Brian Knoblauch's suggestion to remove indexes first and then rebuild them afterwards should help immensely. Just remember to always remove the clustered index last and add it back first.
Tom H.
+2  A: 

Have you tried using alter table rather than moving data to a new table? Whyever would you use Select into? Just alter your current structure.

HLGEM
I was trying this once, and got the impression SQL was creating some temporary table internally, pushing the data behind the scenes. Overall it seemed to go faster if I do the same myself.
Ron Harlev
Also I do need to transform some data on the way. Concatenating 2 columns into one can be a meaningful example for this I think.
Ron Harlev
Alter table does not do that if properly scripted (unless you use the GUI which does)I just added a column with a default value to an 11 million record test file in 10 minutes.BCP import would work faster than your method as well. Just remeber to populate all indexes, current contraints etc.
HLGEM
+2  A: 

We have a similar problem and I've found that the fastest way to do it is to export the data to delimited files (in chunks - depending on the size of the rows - in our case, each file had 500,000 rows), doing any transforms during the export, drop and recreate the table with the new schema, and then do a bcp import from the files.

A 30 million row table took a couple of hours using that method, where an alter table took over 30 hours.

rjrapson
A: 

Add the column allowing null, then do the update to the default value manually, then re-alter the table to add the default value. This way you can control the updates and do them in smaller chunks.

Jonas Lincoln
A: 

I have a similar sounding problem which occurs reasonably frequently.

Our database caches the results of a remote stored procedure which, occasionally, expands with new fields.

This table is millions of rows (and now up to about 80 fields) with a couple of indices and having played around with #temp tables and such (even using bcp to temporary files); I use the select into a new table option:

  • create a new table with the new structure
  • do a select into that table
  • drop the original one
  • rename the new table to the old one's name
Unsliced