views:

387

answers:

6

Say I write the query:

INSERT INTO DestinationTable
(ColumnA, ColumnB, ColumnC, etc.)
SELECT FROM SourceTable
(ColumnA, ColumnB, ColumnC, etc.)

And my source table has 22 million rows.

SQL server fills up my hard drive, and errors out.

Why can't SQL server handle my query?

Should I use a cursor and insert a row at a time?

PS - it is SQL Express 2005, but I could try on the full version.

UPDATE: I also want to mention that my source table only takes up around 1GB of storage when I look at it in the management studio. And yet my 25GB of free disk space somehow gets filled up? I am also using 2 different databases Source.mdf -> Destination.mdf, I don't know if this makes any difference.

+8  A: 

Batch update...

INSERT INTO DestinationTable
    (ColumnA, ColumnB, ColumnC, etc.)
SELECT TOP 100000 ColumnA, ColumnB, ColumnC, etc.
FROM SourceTable
WHERE NOT EXISTS (SELECT *
    FROM DestinationTable
    WHERE DestinationTable.KeyCols = SourceTable.KeyCols)

WHILE @@ROWCOUNT <> 0
    INSERT INTO DestinationTable
        (ColumnA, ColumnB, ColumnC, etc.)
    SELECT TOP 100000 ColumnA, ColumnB, ColumnC, etc.
    FROM SourceTable
    WHERE NOT EXISTS (SELECT *
        FROM DestinationTable
        WHERE DestinationTable.KeyCols = SourceTable.KeyCols)

There are variations to deal with checkpointing, log file management, if you need it in one txn etc

gbn
I think my suggestion of a cursor might be slightly less complicated than this. I will try both and see which performs better.
Jonathan.Peppers
I think a cursor would take a year and a half to insert all your data ;)
womp
@Jonathan.Peppers: a CURSOR still needs resources, locks, perhaps 22m rows in tempdb depends how you declare it
gbn
@gbn, you could eliminate the duplication of the insert by replacing the outer _INSERT with SELECT 1_ because it will generate a @@ROWCOUNT to get the loop going the first time
KM
@KM: doh! of course.
gbn
+1  A: 

This blog post has info about importing data into SQL Server.

As for the reason you table is filling up, I would look at the schema of the table, and make sure there are the column sizes are as small as they can possibly be.

I would really analyze if all the data is necessary.

David Basarab
The reason I'm moving the table is because I'm making my table structure as small as possible. The data is imported by a 3rd party, or we would not have this issue.
Jonathan.Peppers
+1  A: 

You can bulk copy the data to a CSV file and import it in.

Read up the BCP utility here.

Raj More
22 million rows?
gbn
I might as well use USPS.
Jonathan.Peppers
Well I'd use a |delimted .txt instead of a .csv and bulk insert or SSIS but BCP works fine. To Jonathan, I import a 22 million record file to my database using bulk insert and it takes 16 minutes.
HLGEM
But you're suggesting exporting to a csv and then importing back into SQL server? I'd rather back them up on 1.44 floppies and use the pack and ship promise.
Jonathan.Peppers
A: 

You are inserting data in a way that supports a transaction. There is no way to disable this through the method you're using, however you could do this outside of the scope of a transaction through other methods. Read below:

http://support.microsoft.com/kb/59462

The key approach is this:

DBOPTION 'SELECT INTO' to true

http://www.mssqlcity.com/FAQ/Devel/select%5Finto.htm

Nissan Fan
Really? This is a standard INSERT, can not be "minimally logged". And since SQL Server 2000 you should use ALTER DATABASE.
gbn
...and the KB is ancient, and even mentioned when the option applies
gbn
+2  A: 

You could try setting the database recovery model to "Simple" instead of "Full" (the default). This is done on the Options page of the database properties in Management Studio. That should keep your transaction log size down. After you're done the insert you can always set the recovery model back to Full.

TLiebe
I'll try it and see, I just don't like this solution if it ends up being an automated task.
Jonathan.Peppers
If the DB is in full recovery and log backups are been taken then a switch to Simple breaks the log chain. This must be very carefully considered in a production environment when point-in-time recovery is required. After switching back to full a full database backup must be taken to restart the log chain and allow further log backups.
GilaMonster
+1  A: 

I would highly recommend you to set the database recovery model to BULK_LOGGED while carrying out such heavy bulk data operations.

By default - database is set to SIMPLE or FULL recovery model.

The full recovery model, which fully logs all transactions, is intended for normal use.

The bulk-logged recovery model is intended to be used temporarily during a large bulk operation— assuming that it is among the bulk operations that are affected by the bulk-logged recovery model (for more information, see Operations That Can Be Minimally Logged at msdn.microsoft.com/en-us/library/ms191244.aspx).

BULK_LOGGED recovery model minimally logs the transactions

you can do it by using below snippet

    --Determine the recovery model currently used for the database

    SELECT name AS [Database Name],
    recovery_model_desc AS [Recovery Model]
    FROM sys.databases 
    WHERE name=<database_name> ;

    --Remember this recovery model so that you can switch back to the same later

    --set the database recovery model to BULK_LOGGED

    ALTER DATABASE <database_name>  SET RECOVERY BULK_LOGGED;

    --Run your heavy data insert tasks
    INSERT INTO DestinationTable
    (ColumnA, ColumnB, ColumnC, etc.)
    SELECT FROM SourceTable
    (ColumnA, ColumnB, ColumnC, etc.)

    /*Again set the database recovery model to FULL or SIMPLE 
    (the result which we had got from first query)*/

    ALTER DATABASE <database_name>  SET RECOVERY FULL;   
    --OR 
    ALTER DATABASE <database_name>  SET RECOVERY SIMPLE;

*Note - Please do keep patience during the bulk operation is being processed * [:P]

I have done this many times before. Do let me know whether this helped you.

You can refer below MSDN article for details of switching between recovery models - Considerations for Switching from the Full or Bulk-Logged Recovery Model at msdn.microsoft.com/en-us/library/ms190203.aspx

Aamod
I'll try it out, setting it to SIMPLE did not have too much of an effect. It still errored out eventually.
Jonathan.Peppers
Offcourse.. FULL and SIMPLE recovery models are way too behind BULK_LOGGED in terms of performance for Bulk data operations.
Aamod
BULK_LOGGED was closer but did not quite get there on my system, and even if it did, wouldn't I have to shrink the database/files to get it down to an acceptable size? I think batching the insert as the top answer suggests, is the way to go.
Jonathan.Peppers
-1. You don't understand bulklogged. This is not a minimally logged operation.
gbn
Simple recovery also allows minimal logging for certain operations. It's only full where all operations are fully logged.
GilaMonster