ansaurus

Question

Answer 1

+4 A:

I think your fastest method would be to:

Drop all foreign keys and indexes from your table.
Truncate your table.
Bulk insert your data.
Recreate your foreign keys and indexes.

Joe Stefanelli 2010-10-26 14:49:03

Thanks for the tips, I didn't know about Truncate and will likely use it, but I am trying to eliminate the short time period between the deletion and bulk insert when the table is empty. Any ideas?

rpf3 2010-10-26 15:04:16

@rpf3: Give the TRUNCATE a try. I think it will eliminate much of the delay you're talking about.

Joe Stefanelli 2010-10-26 15:06:56

The Truncate was definitely faster than the delete but it still takes ~9 seconds for the Bulk Insert. I've been asked to see if there is a way to eliminate even this small amount of downtime because other processes might hit the database during runtime.

rpf3 2010-10-26 15:22:17

@rpf3: If you've followed the steps I've given, then I'm not aware of anything else to speed this up. Honestly, 9 seconds to bulk insert 150K rows once a day doesn't sound unreasonable to me.

Joe Stefanelli 2010-10-26 15:33:53

Oh I completely agree with you, in fact it's a massive improvement to how it was being done previously row by row. However, due to the nature of the data in the table, if a query was run and no row was returned during those 9 seconds it could potentially be very bad. This is a fringe case but with automated processes running all day that have the potential to hit the DB, it may happen. Is there a way to do the bulk insert into a temp table and then swap the two or something?

rpf3 2010-10-26 15:37:52

@rpf3: You could bulk insert into a temp table and try something like [sp_rename](http://msdn.microsoft.com/en-us/library/ms188351.aspx), which I think would require a table lock which would be just as harmful. You could try creating two versions of the table and create a view that would alternate between the two while the opposite one was bulk-inserted. Ultimately though, my gut reaction is that it may just be safer and easier to code some retry/exception handling logic into those automated processes.

Joe Stefanelli 2010-10-26 15:59:57

Answer 2

A:

For raw speed, I think with ~150K rows in the table, I'd just drop the table, recreate it from scratch (without indexes) and then bulk load afresh. Once the bulk load has been done, then create the indexes.

This assumes of course that having a period of time when the table is empty/doesn't exist is acceptable which it does sound like could be the case.

AdaTheDev 2010-10-26 14:55:09

Answer 3

A:

Is the problem that Joe's solution is not fast enough, or that you can not have any activity against the target table while your process runs? If you just need to prevent users from running queries against your target table, you should contain your process within a transaction block. This way, when your TRUNCATE TABLE executes, it will create a table lock that will be held for the duration of the transaction, like so:

begin tran;

truncate table stage_table

bulk insert stage_table
from N'C:\datafile.txt'

commit tran;

Sake God 2010-10-26 16:04:53

I was thinking about doing this but if you don't have permissions to access either the datafile or the formatfile an error gets thrown that cannot be caught by SQL TRY/CATCH and will stop the code mid transaction, leaving it open.

rpf3 2010-10-26 16:55:04

Answer 4

+1 A:

An alternative solution which would satsify your requirement for not having "down time" for the table you are updating.

It sounds like originally you were reading the file and doing an INSERT/UPDATE/DELETE 1 row at a time. A more performant approach than that, that does not involve clearing down the table is as follows:

1) bulk load the file into a new, separate table (no indexes)
2) then create the PK on it
3) Run 3 statements to update the original table from this new (temporary) table:
DELETE rows in the main table that don't exist in the new table
UPDATE rows in the main table where there is a matching row in the new table
INSERT rows into main table from the new table where they don't already exist

This will perform better than row-by-row operations and should hopefully satisfy your overall requirements

AdaTheDev 2010-10-26 20:36:44

Thanks, I'm gonna run some tests to see if I want to use this or just keep the bulk insert inside a locked transaction for the short run time.

rpf3 2010-10-27 16:29:46

Answer 5

+1 A:

There is a way to update the table with zero downtime: keep two day's data in the table, and delete the old rows after loading the new ones!

Add a DataDate column representing the date for which your ~150K rows are valid.
Create a one-row, one-column table with "today's" DataDate.
Create a view of the two tables that selects only rows matching the row in the DataDate table. Index it if you like. Readers will now refer to this view, not the table.
Bulk insert the rows. (You'll obviously need to add the DataDate to each row.)
Update the DataDate table. View updates Instantly!
Delete yesterday's rows at your leisure.

SELECT performance won't suffer; joining one row to 150,000 rows along the primary key should present no problem to any server less than 15 years old.

I have used this technique often, and have also struggled with processes that relied on sp_rename. Production processes that modify the schema are a headache. Don't.

James K. Lowden 2010-10-28 04:48:16

ansaurus

tags:

views:

answers:

Delete All / Bulk Insert

related questions