ansaurus

Question

Batch commit on large INSERT operation in native SQL?

Answer 1

+1 A:

You could partition your data and insert your data in a cursor loop. That would be nearly the same as SSIS batchinserting. But runs on your server.

create cursor ....
select YEAR(DateCol), MONTH(DateCol) from whatever

while ....
    insert into yourtable(...)
    select * from whatever 
    where YEAR(DateCol) = year and MONTH(DateCol) = month
end

Arthur 2009-10-21 17:23:43

This was ultimately the strategy I used.

Cade Roux 2009-12-15 21:54:47

Answer 2

+1 A:

You could use the BCP command to load the data and use the Batch Size parameter

http://msdn.microsoft.com/en-us/library/ms162802.aspx

Two step process

BCP OUT data from Views into Text files
BCP IN data from Text files into Tables with batch size parameter

Raj More 2009-10-21 17:29:21

13 seconds! *shakes tiny fist*

Chris McCall 2009-10-21 17:29:51

How did you get the seconds part? It only shows me minutes

Raj More 2009-10-21 17:50:11

@Raj: hover over the relative timestamps to get actual timestamps with seconds

Michael Haren 2009-10-21 17:53:20

it changes to minutes and then hours then days as time goes on. pretty slick, never thought about it before

Chris McCall 2009-10-21 21:09:12

Answer 3

+1 A:

This looks like a job for good ol' BCP.

Chris McCall 2009-10-21 17:29:34

Answer 4

+1 A:

There is no pixie dust, you know that.

Without knowing specifics about the actual schema being transfered, a generic solution would be exactly as you describe it: divide processing into multiple inserts and keep track of the key(s). This is sort of pseudo-code T-SQL:

create table currentKeys (table sysname not null primary key, key sql_variant not null);
go

declare @keysInserted table (key sql_variant);
declare @key sql_variant;
begin transaction
do while (1=1)
begin
    select @key = key from currentKeys where table = '<target>';
    insert into <target> (...)
    output inserted.key into @keysInserted (key)
    select top (<batchsize>) ... from <source>
    where key > @key
    order by key;

    if (0 = @@rowcount)
       break; 

    update currentKeys 
    set key = (select max(key) from @keysInserted)
    where table = '<target>';
    commit;
    delete from @keysInserted;
    set @key = null;
    begin transaction;
end
commit

It would get more complicated if you want to allow for parallel batches and partition the keys.

Remus Rusanu 2009-10-21 17:30:06

Actually, ground-up silver bullets make excellent pixie dust

Philip Kelley 2009-10-21 18:23:46

ty mr Kelley, I'll keep that in mind on my next project ;)

Remus Rusanu 2009-10-21 18:40:15

Answer 5

+2 A:

Does the view have ANY kind of unique identifier / candidate key? If so, you could select those rows into a working table using:

SELECT key_columns INTO dbo.temp FROM dbo.HugeView;

(If it makes sense, maybe put this table into a different database, perhaps with SIMPLE recovery model, to prevent the log activity from interfering with your primary database. This should generate much less log anyway, and you can free up the space in the other database before you resume, in case the problem is that you have inadequate disk space all around.)

Then you can do something like this, inserting 10,000 rows at a time, and backing up the log in between:

SET NOCOUNT ON;

DECLARE
    @batchsize INT,
    @ctr INT,
    @rc INT;

SELECT
    @batchsize = 10000,
    @ctr = 0;

WHILE 1 = 1
BEGIN
    WITH x AS
    (
        SELECT key_column, rn = ROW_NUMBER() OVER (ORDER BY key_column)
        FROM dbo.temp
    )
    INSERT dbo.PrimaryTable(a, b, c, etc.)
        SELECT v.a, v.b, v.c, etc.
        FROM x
        INNER JOIN dbo.HugeView AS v
        ON v.key_column = x.key_column
        WHERE x.rn > @batchsize * @ctr
        AND x.rn <= @batchsize * (@ctr + 1);

    IF @@ROWCOUNT = 0
        BREAK;

    BACKUP LOG PrimaryDB TO DISK = 'C:\db.bak' WITH INIT;
END

That's all off the top of my head, so don't cut/paste/run, but I think the general idea is there.

Note that if you are taking regular database and log backups you will probably want to take a full to start your log chain over again.

Aaron Bertrand 2009-10-21 17:35:15

That's funny, because before I built most of this system in the server and was still using my box to develop, I had already divided out categories of tables for recovery objectives and put them in separate schemas in the hope that the DBAs would give me multiple databases, but it was not to be...

Cade Roux 2009-10-21 17:55:37

And unfortunately the keys are all over 50 bytes of composite columns, guess it's time to get it done...

Cade Roux 2009-10-21 18:08:00

ansaurus

tags:

views:

answers:

Batch commit on large INSERT operation in native SQL?

related questions