views:

153

answers:

3

I'm new to working with this much data (20 million rows) and I don't know what I should expect as far as query duration:

update table set field = '1234'  

No index on field. This statement took 25 minutes. The database is set to Simple Recovery. Does 25 minutes seem too long? Table has 9 columns with small data types <50 varchar.

A: 
  • Are there any indexed views referencing that field?
  • Is that field published in a (transactional) replication scheme?
  • Are other sessions accessing that table at the same time the update is running?
  • Are the log and data files stored on separate disks (physical, not two different partitions of the same hardware)?
  • Are there any check constraints referencing that field?
  • Are there any triggers on that table?

All of these, and likely many other factors, will affect data modification performance.

Otherwise, try batching the update using TOP and a WHERE clause that finds only umodified rows.

Daniel Renshaw
No, no, no, no, no, no. :)
subt13
I tried batching, but wasn't sure about the batch size. I tried 10000 and it was taking even longer.
subt13
+11  A: 

IF you updated 20M rows in one single transaction, then your time was entirely driven by your IO subsystem: what kind of drives you have, what disk files layout etc. If you have 40 spindles in raid 10 with 4 balanced files and a separate similar battery for the log then the result is worryingly slow. If you tested this with one single MDF that shares the spindle with the LDF on a single consumer quality 5000rpm HDD then your time is amazingly fast.

Remus Rusanu
+1: Is IO always the issue for cases like these, or is there a vague tipping point?
OMG Ponies
For 20M rows in a single update statement is all going to be driven by how fast the log can flush, with some spikes when the DB checkpoints kick in due to buffer pool pressure for free pages.
Remus Rusanu
The tell-tale sign is going to be the `Avg. Disk Queue Length` (>2 per spindle) and the `Avg. sec/transfer` (>10ms) performance counters. http://msdn.microsoft.com/en-us/library/ms175903.aspx
Remus Rusanu
Good points. This is a server OS running a server application using desktop hardware. No RAID, single 250 GB 7200 drive. That's what I needed to know.
subt13
@Remus regardless of the recovery mode setting (simple vs full/bulk)?
subt13
@subt13: Even in simple mode, the transaction has to log every row change so that it can do a rollback. 20M rows-> 20M change records in log + space reserved for 20M undo operations, it need log space and every bit has to be written to disk. The recovery model would play a role only if you had multiple transactions, but it would not make things faster no matter the recovery model. Only bulk *may* be faster for certain *insert* operations, never for updates.
Remus Rusanu
Based on this feedback it seems like the 25 minutes is acceptable and if I want to speed it up, I'm going to have to upgrade the hardware.
subt13
You may give disk partition alignment a shot: http://msdn.microsoft.com/en-us/library/dd758814%28SQL.100%29.aspx
Remus Rusanu
+2  A: 

You are updating 20 Mio records in about 1500s averaging at somrthing of 7000 updates per second. Sounds about right.

Peter Tillemans
Just curious as to why that sounds about right?
subt13
because the order of magnitude is in the range I would expect, less than 1000 would be strange more than 40000 also.
Peter Tillemans