views:

332

answers:

3

Hello,
We have noticed that our queries are running slower on databases that had big chunks of data added (bulk insert) when compared with databases that had the data added on record per record basis, but with similar amounts of data. We use Sql 2005 Express and we tried reindexing all indexes without any better results. Do you know of some kind of structural problem on the database that can be caused by inserting data in big chunks instead of one by one?

Thanks

A: 

Probably SQL Server allocated new disk space in many small chunks. When doing big transactions, it's better to pre-allocate much space in both the data and log files.

Dario Solera
We never understood exactly what happened but we're working on prealocating and defragging the db files.
Paulo Manuel Santos
+1  A: 

One tip I've seen is to turn off Auto-create stats and Auto-update stats before doing the bulk insert:

ALTER DATABASE databasename SET AUTO_CREATE_STATISTICS OFF WITH NO_WAIT

ALTER DATABASE databasename SET AUTO_UPDATE_STATISTICS OFF WITH NO_WAIT

Afterwards, manually creating statistics by one of 2 methods:

--generate statistics quickly using a sample of data from the table
exec sp_createstats

or

--generate statistics using a full scan of the table
exec sp_createstats @fullscan = 'fullscan'

You should probably also turn Auto-create and Auto-update stats back on when you're done.

Another option is to check and defrag the indexes after a bulk insert. Check out Pinal Dave's blog post.

C-Pound Guru
A: 

That's an interesting question.

I would have guessed that Express and non-Express have the same storage layout, so when you're Googling for other people with similar problems, don't restrict yourself to Googling for problems in the Express version. On the other hand though, bulk insert is a common-place operation and performance is important, so I wouldn't consider it likely that this is a previously-undetected bug.

One obvious question: which is the clustered index? Is the clustered index also the primary key? Is the primary key unassigned when you insert, and therefore initialized by the database? If so then maybe there's a difference (between the two insert methods) in the pattern or sequence of successive values assigned by the database, which affects the way in which the data is clustered, which then affects performance.

Something else: as well as indexes, people say that SQL uses statistics (which it created as a result of runing previous queries) to optimize its execution plan. I don't know any details of that, but as well as "reindexing all indexes", check the execution plans of your queries in the two test cases to ensure that the plans are identical (and/or check the associated statistics).

ChrisW