views:

734

answers:

3

In a comment I read

Just as a side note, it's sometimes faster to drop the indices of your table and recreate them after the bulk insert operation.

Is this true? Under which circumstances?

+3  A: 

Yes, it is true. When there are indexes on the table during an insert, the server will need to be constantly re-ordering/paging the table to keep the indexes up to date. If you drop the indexes, it can just add the rows without worrying about that, and then build the indexes all at once when you re-create them.


Edit: I should note that I'm working on a project right now where the opposite effect was observed. We wanted to reduce the run-time of a large import (nightly dump from a mainframe system). We tried removing the indexes, importing the data, and re-creating them. It actually significantly increased the time for the import to complete. But, this is not typical. It just goes to show that you should always test first for your particular system.

Joel Coehoorn
Would inserting the new data into a temporary table and then doing something like INSERT INTO TABLE x (SELECT * FROM y) be a viable alternative? Depending on the DB, there may be some index optimisation involved - or maybe I'm waay off the mark
Harry Lime
No, because then you're doing two inserts. Of course, your situation may vary, but in general this won't help.
Joel Coehoorn
Cool - it was just a thought.
Harry Lime
+2  A: 

As with Joel I will echo the statement that yes it can be true. I've found that the key to identifying the scenario that he mentioned is all in the distribution of data, and the size of the index(es) that you have on the specific table.

In an application that I used to support that did a regular bulk import of 1.8 million rows, with 4 indexes on the table, 1 with 11 columns, and a total of 90 columns in the table. The import with indexes took over 20 hours to complete. Dropping the indexes, inserting, and re-creating the indexes only took 1 hour and 25 minutes.

So it can be a big help, but a lot of it comes down to your data, the indexes, and the distribution of data values.

Mitchel Sellers
I think in our case, a big part of it was that we always started with a blank table, and the bulk data was coming in cluster order. Therefore not much extra work going on.
Joel Coehoorn
Ah yes, that would be a good part of it. We always started with a blank table as well, but the input data was all over the place!
Mitchel Sellers
+1  A: 

One thing you should consider when dropping and recreating indexes is that it should only be done on automated processes that run during the low volumne periods of database use. While the index is dropped it can't be used for other queries that other users might be riunning at the same time. If you do this during production hours ,your users will probably start complaining of timeouts.

HLGEM