views:

413

answers:

4

I'm running a fairly substantial SSIS package against SQL 2008 - and I'm getting the same results both in my dev environment (Win7-x64 + SQL-x64-Developer) and the production environment (Server 2008 x64 + SQL Std x64).

The symptom is that initial data loading screams at between 50K - 500K records per second, but after a few minutes the speed drops off dramatically and eventually crawls embarrasingly slowly. The database is in Simple recovery model, the target tables are empty, and all of the prerequisites for minimally logged bulk inserts are being met. The data flow is a simple load from a RAW input file to a schema-matched table (i.e. no complex transforms of data, no sorting, no lookups, no SCDs, etc.)

The problem has the following qualities and resiliences:

  1. Problem persists no matter what the target table is.
  2. RAM usage is lowish (45%) - there's plenty of spare RAM available for SSIS buffers or SQL Server to use.
  3. Perfmon shows buffers are not spooling, disk response times are normal, disk availability is high.
  4. CPU usage is low (hovers around 25% shared between sqlserver.exe and DtsDebugHost.exe)
  5. Disk activity primarily on TempDB.mdf, but I/O is very low (< 600 Kb/s)
  6. OLE DB destination and SQL Server Destination both exhibit this problem.

To sum it up, I expect either disk, CPU or RAM to be exhausted before the package slows down, but instead its as if the SSIS package is taking an afternoon nap. SQL server remains responsive to other queries, and I can't find any performance counters or logged events that betray the cause of the problem.

I'll gratefully reward any reasonable answers / suggestions.

+2  A: 

Are you issuing any COMMITs? I've seen this kind of thing slow down when the working set gets too large (a relative measure, to be sure). A periodic COMMIT should keep that from happening.

DaveE
Will post feedback here to let you know whether that sorts out the problem.
Mark
+2  A: 

First thoughts:

  • Are the database files growing (without instant file initialization for MDFs)?
  • Is the upload batched/transactioned? AKA, is it one big transaction?)
gbn
The MDF files are preallocated, but the destinations are committing in one large batch. I'll try breaking that down.
Mark
+3  A: 

The best way to diagnose performance issues with SSIS Data Flows is with decomposition.

Step 1 - measure your current package performance. You need a baseline. Step 2 - Backup your package, then edit it. Remove the Destination and replace it with a Row Count (or other end-of-flow-friendly transform). Run the package again to measure performance. Now you know the performance penalty incurred by your Destination. Step 3 - Edit the package again, removing the next transform "up" from the bottom in the data flow. Run and measure. Now you know the performance penalty of that transform. Step 4...n - Rinse and repeat.

You probably won't have to climb all the way up your flow to get an idea as to what your limiting factor is. When you do find it, then you can ask a more targeted performance question, like "the X transform/destination in my data flow is slow, here's how it's configured, this is my data volume and hardware, what options do I have?" At the very least, you'll know exactly where your problem is, which stops a lot of wild goose chases.

Todd McDermid
Thanks for the feedback Todd - seems like a very pragmatic and logical way to diagnose the components causing the problem - but in this case I know *for sure* it's the destination... remove either the OLE DB or SQL Server Destination, and suddenly the world's a great place. Add them back, and several minutes in, the process slows to a crawl.
Mark
BTW - Todd, thanks for creating that really excellent Kimball SCD component!
Mark
Todd - I see you participated in an MSDN blog where a Greg has a problem with performance on a very high-end x64 machine. His problem description sounded exactly like mine, and we're both running x64 runtimes. To your knowledge, are there any problems with the x64 OLE DB source / destination components?
Mark
Not AFAIK. I run x64 too - albeit with what I'd consider small loads. A good resource for good habits for larger loads is the SQLCAT site...
Todd McDermid
+1  A: 

We finally found a solution... the problem lay in the fact that my client was using VMWare ESX, and despite the VM reporting plenty of free CPU and RAM, the VMWare gurus has to pre-allocate (i.e. gaurantee) the CPU for the SSIS guest VM before it really started to fly. Without this, SSIS would be running but VMWare would scale back the resources - an odd quirk because other processes and software kept the VM happily awake. Not sure why SSIS was different, but as I said, the VMWare gurus fixed this problem by reserving RAM and CPU.

I have some other feedback by way of a checklist of things to do for great performance in SSIS:

  1. Ensure SQL login has BULK DATA permission, else data load will be very slow. Also check that the target database uses the Simple or Bulk Logged recovery model.
  2. Avoid sort and merge components on large data - once they start swapping to disk the performance gutters.
  3. Source sorted input data (according to the target table's primary key), and disable non-clustered indexes on target table, set MaximumInsertCommitSize to 0 on the destination component. This bypasses TempDB and log altogether.
  4. If you cannot meet requirements for 3, then simply set MaximumInsertCommitSize to the same size as the data flow's DefaultMaxBufferRows property.
Mark