views:

1496

answers:

11

SETUP

I have to insert a couple million rows in either SQL Server 2000/2005, MySQL, or Access. Unfortunately I don't have an easy way to use bulk insert or BCP or any of the other ways that a normal human would go about this. The inserts will happen on one particular database but that code needs to be db agnostic -- so I can't do bulk copy, or SELECT INTO, or BCP. I can however run specific queries before and after the inserts, depending on which database I'm importing to.

eg.

If IsSqlServer() Then
    DisableTransactionLogging();
ElseIf IsMySQL() Then
    DisableMySQLIndices();
End If

... do inserts ...

If IsSqlServer() Then
    EnableTransactionLogging();
ElseIf IsMySQL() Then
    EnableMySQLIndices();
End If

QUESTION

Are there any interesting things I can do to SQL Server that might speed up these inserts?

For example, is there a command I could issue to tell SQL Server, "Hey, don't bother recording these transactions in the transaction log".

Or maybe I could say, "Hey, I have a million rows coming in, so don't update your index until I'm totally finished".

ALTER INDEX [IX_TableIndex] ON Table DISABLE
     ... inserts
ALTER INDEX [IX_TableIndex] ON Table REBUILD

(Note: Above index disable only works on 2005, not 2000. Bonus points if you know a way to do this on 2000).

What about MySQL, and Access?

+1  A: 

For SQL Server:

  1. You can set the recovery model to "Simple", so your transaction log will be kept small. Do not forget to set back afterwards.
  2. Disabling the indexes is actually a good idea. This will work on SQL 2005, not on SQL Server 2000.

    alter index [INDEX_NAME] on [TABLE_NAME] disable

And to enable

alter index [INDEX_NAME] on [TABLE_NAME] rebuild

And then just insert the rows one by one. You have to be patient, but at least it is somewhat faster.

GvS
Unless you are doing log shipping, in which case you probably don't want to set the recovery model to simple, unless you want to lose some data.
Kibbee
+1  A: 

If it is a one-time thing (or it happens often enough to justify automating this), also considering dropping/disabling all indexes, and then adding/reenabling them again when the insert it done

Yaakov Ellis
Wasn't it mentioned in his question that he could not use bulk insert?
GvS
It wasn't mentioned in his question at the time when I wrote this answer. He added that after he saw this.
Yaakov Ellis
A: 

You might consider using SQL's bulk-logged recovery model during your bulk insert.

http://msdn.microsoft.com/en-us/library/ms190422(SQL.90).aspx

http://msdn.microsoft.com/en-us/library/ms190203(SQL.90).aspx

You might also disable the indexes on the target table during your inserts.

Jeremy
+2  A: 

is this a regular process or a one time event?

I have, in the past, just scripted out the current indexes, dropped them, inserted the rows, then just re-add the indexes.

The SQL Management Studio can script out the indexes from the right click menus...

Jeff Martin
A: 

The trouble with setting the recovery model to simple is that it affects any other users entering data at the same time and thus will amke thier changes unrecoverable.

Samre thing with disabling the indexes, this disables for everyone and may make the database run slower than a slug.

Suggest you run the import in batches.

HLGEM
I actually control this, so no one else will be using the db at this time.
Michael Pryor
A: 

If this is not something that needs to be read terribly quickly, you can do an "Insert Delayed" into the table on MySQL. This allows your code to continue running without having to wait for the insert to actually happen. This does have some limitations, but if your primary concern is to get the program to finish quickly, this may help. Be warned that there is a nice long list of situations where this may not act as expected. Check the docs.

I do not know if this functionality works for Access or MS SQL, though.

Jack M.
A: 

Hi

Have you considered using the Factory pattern? I'm guessing you're writing the code for this, so if using the factory pattern you could code up a factory that returned a concrete "IDataInserter" type class that would do the work for.

This would still allow you to be data agnostic and get the fastest method for each type of database.

Ray Booysen
A: 

SQL Server 2000/2005, MySQL, and Access can all load directly from a tab / cr text file they just have different commands to do it. If you've got the case statement to determine which DB you're importing into just figure out their preference for importing a text file.

Frank Flynn
Using SSIS or DTS in SQL would standardize the text file load process and take that variable among the destination systems out of the mix.
John Mo
+2  A: 

The single biggest thing that will kill performance here is the fact that (it sounds like) you're executing a million different INSERTs against the DB. Each INSERT is treated as a single operation. If you can do this as a single operation, then you will almost certainly have a huge performance improvement.

Both MySQL and SQL Server support 'selects' of constant expressions without a table name, so this should work as one statement:

INSERT INTO MyTable(ID, name)
SELECT 1, 'Fred'
UNION ALL SELECT 2, 'Wilma'
UNION ALL SELECT 3, 'Barney'
UNION ALL SELECT 4, 'Betty'

It's not clear to me if Access supports that, not having Access available. HOWEVER, Access does support constants in a SELECT, as far as I can tell, and you can coerce the above into ANSI SQL-92 (which should be supported by all 3 engines; it's about as close to 'DB agnostic' as you'll get) by just adding

FROM OneRowTable

to the end of every individual SELECT, where 'OneRowTable' is a table with just one row of dummy data.

This should let you insert a million rows of data in much much less than a million INSERT statements -- and things like index reshuffling will be done once, rather than a million times. You may have much less need for other optimisations after that.

Cowan
A: 

Can you use DTS (2000) or SSIS (2005) to build a package to do this? DTS and SSIS can both pull from the same source and pipe out to the different potential destinations. Go for SSIS if you can. There's a lot of good, fast technology in there along with functionality to embed the IsSQLServer, IsMySQL, etc. logic.

John Mo
A: 

It's worth considering breaking your inserts into smaller batches; a single transaction with lots of queries will be slow.

vincebowdren