ansaurus

Question

How to change slow parametrized inserts into fast bulk copy (even from memory)

Answer 1

+3 A:

Is required the transaction? Using transaction need much more resources than simple commands.

Also If you are sure, that inserted values are corect, you can use a BulkInsert.

TcKs 2008-09-24 13:38:10

I use one transacion for all operations so IMHO it does not matter. Am I wrong ?

Michal Sznajder 2008-09-24 13:50:36

Yes you are wrong. Transaction should be so short as they can be. Because managing transaction ( and the data in them ) are littlebit expensive.

TcKs 2008-09-30 16:15:31

Answer 2

+2 A:

1 minute sounds pretty reasonable for 0.5 million records. That's a record every 0.00012 seconds.

Does the table have any indexes? Removing these and reapplying them after the bulk insert would improve performance of the inserts, if that is an option.

Ian Nelson 2008-09-24 13:38:24

One primary index on auto int field

Michal Sznajder 2008-09-24 13:51:37

Answer 3

+1 A:

It doesn't seem unreasonable to me to process 8,333 records per second...what kind of throughput are you expecting?

JustinD 2008-09-24 13:40:29

Answer 4

+1 A:

If you need better speed, you might consider implementing bulk insert:

http://msdn.microsoft.com/en-us/library/ms188365.aspx

stephenbayer 2008-09-24 13:41:03

Answer 5

A:

I assume that what is taking the approximately 58 seconds is the physical inserting of 500,000 records - so you are getting around 10,000 inserts a second. Without knowing the specs of your database server machine (I see you are using localhost, so network delays shouldn't be an issue), it is hard to say if this is good, bad, or abysmal.

I would look at your database schema - are there a bunch of indices on the table that have to be updated after each insert? This could be from other tables with foreign keys referencing the table you are working on. There are SQL profiling tools and performance monitoring facilities built into SQL Server, but I've never used them. But they may show up problems like locks, and things like that.

Ken Ray 2008-09-24 13:41:21

Answer 6

A:

Do the fancy stuff on the data, on all records first. Then Bulk-Insert them.

(since you're not doing selects after an insert .. i don't see the problem of applying all operations on the data before the BulkInsert

sirrocco 2008-09-24 13:53:33

Answer 7

A:

If I had to guess, the first thing I would look for are too many or the wrong kind of indexes on the tbTrafficLogTTL table. Without looking at the schema definition for the table, I can't really say, but I have experienced similar performance problems when:

The primary key is a GUID and the primary index is CLUSTERED.
There's some sort of UNIQUE index on a set of fields.
There are too many indexes on the table.

When you start indexing half a million rows of data, the time spent to create and maintain indexes adds up.

I will also note that if you have any option to convert the Year, Month, Day, Hour, Minute, Second fields into a single datetime2 or timestamp field, you should. You're adding a lot of complexity to your data architecture, for no gain. The only reason I would even contemplate using a split-field structure like that is if you're dealing with a pre-existing database schema that cannot be changed for any reason. In which case, it sucks to be you.

Craig Trader 2008-09-24 13:53:54

Answer 8

A:

I had a similar problem in my last contract. You're making 500,000 trips to SQL to insert your data. For a dramatic increase in performance, you want to investigate the BulkInsert method in the SQL namespace. I had "reload" processes that went from 2+ hours to restore a couple of dozen tables down to 31 seconds once I implemented Bulk Import.

David 2008-09-24 13:58:46

Answer 9

A:

This could best be accomplished using something like the bcp command. If that isn't available, the suggestions above about using BULK INSERT are your best bet. You're making 500,000 round trips to the database and writing 500,000 entries to the log files, not to mention any space that needs to be allocated to the log file, the table, and the indexes.

If you're inserting in an order that is different from your clustered index, you also have to deal with the time require to reorganize the physical data on disk. There are a lot of variables here that could possibly be making your query run slower than you would like it to.

~10,000 transactions per second isn't terrible for individual inserts coming roundtripping from code/

Jeremiah Peschka 2008-09-24 14:19:00

Answer 10

+7 A:

Instead of inserting each record individually, Try using the SqlBulkCopy class to bulk insert all the records at once.

Create a DataTable and add all your records to the DataTable, and then use SqlBulkCopy.WriteToServer to bulk insert all the data at once.

Adam Hughes 2008-09-24 14:43:56

Answer 11

+1 A:

If some form of bulk insert isn't an option, the other way would be multiple threads, each with their own connection to the database.

The issue with the current system is that you have 500,000 round trips to the database, and are waiting for the first round trip to complete before starting the next - any sort of latency (ie, a network between the machines) will mean that most of your time is spent waiting.

If you can split the job up, perhaps using some form of producer/consumer setup, you might find that you can get much more utilisation of all the resources.

However, to do this you will have to lose the one great transaction - otherwise the first writer thread will block all the others until its transaction is completed. You can still use transactions, but you'll have to use a lot of small ones rather than 1 large one.

The SSIS will be fast because it's using the bulk-insert method - do all the complicated processing first, generate the final list of data to insert and give it all at the same time to bulk-insert.

Jim T 2008-09-24 14:54:54

Answer 12

A:

BULK INSERT = bcp from a permission

You could batch the INSERTs to reduce roundtrips SQLDataAdaptor.UpdateBatchSize = 10000 gives 50 round trips

You still have 500k inserts though...

Article

MSDN

gbn 2008-10-14 18:56:10

ansaurus

tags:

views:

answers:

How to change slow parametrized inserts into fast bulk copy (even from memory)

related questions