views:

258

answers:

4

I've added a new column, packageNo, to my table:

 create table Packages(
  id varchar(20) primary key,  -- ok, I know :)
  orderNo uniqueIdentifier not null,
  orderlineNo int not null, 
  packageNo int not null default(0)
)

Now I want to generate the packageNo with the following rules:

    reset it for each order
    ascendantfor order, orderline

My problem is that the script I wrote uses 15 minutes for 26500 rows on my testServer. Here it is:

set NoCount ON
declare @Counter int
declare @handledCounter int
declare @currentorder uniqueIdentifier
declare @fetchedOrder uniqueIdentifier
declare @fetchedId varchar(20) -- will using PK speed up things?

declare PackageNo_Cursor cursor  for 
  select orderNo, id from packages order by orderNo, orderlineNo for update of packageNo

open PackageNo_Cursor
fetch next from PackageNo_Cursor into @fetchedOrder, @fetchedId 

set @currentOrder = @fetchedOrder
set @counter = 0
set @handledCounter = 0
while @@fetch_status = 0
 begin 
     if (@currentOrder <> @fetchedOrder)
       begin  -- reset counter for each order
        set @currentOrder = @fetchedOrder
        set @counter = 0
       end
     set @counter = @counter + 1
     set @handledCounter = @handledCounter +1
     if (@handledCounter % 50 = 0)
      begin
     print 'handled = ' + cast(@handledCounter as varchar) 
      end
   update packages set packageNo = @counter where current of PackageNo_Cursor    
     fetch next from PackageNo_Cursor into @fetchedOrder, @fetchedId 
  end

close PackageNo_Cursor
deallocate PackageNo_Cursor

This should result in:

id   - orderno - lineNo - packageNo (what I need to set)  
ean1 - guid1   - 1      - 1  
ean2 - guid1   - 2      - 2   
ean3 - guid2   - 1      - 1  
ean15- guid2   - 3      - 2  
ean15- guid2   - 4      - 3

Can I make this run any faster?

+6  A: 

You don't want to use a cursor, you want to do this as a set, like so:

update p set
    packageNo = r.packageNo
from
    packages p
    inner join 
      (select orderNo, orderLineNo, 
       ROW_NUMBER() OVER(PARTITION BY orderNo ORDER BY orderLineNo) as packageNo
       from packages) r on
        p.orderNo = r.orderNo
        and p.orderLineNo = r.orderLineNo

This will leverage SQL Server's ROW_NUMBER function to get you the correct count by each line. Using UPDATE...FROM, we can create an inner join from which to update. This is vastly more efficient than the cursor.

See, cursors add an iterative ability to a procedural and set based language (SQL). The two don't play well together. That update statement is being called in order for each row (and opening/committing a transaction, to boot). If you do it all in one statement, SQL can parallelize this to make it much faster.

The standard rule is this: Use cursors as sparingly as possible. There are some fringe cases where they're necessary, but if you aren't doing massive amounts of SQL administration a day, it's doubtful you'll ever come across those cases.

Eric
this will work but it will be also slow because for each row in package you will need to execute select count(*) separately... this will require O^2 operations...
Bogdan_Ch
@Bogdan: You're right. Changed to use an inner join instead.
Eric
Hi Eric,Thanks for your solution witch solve the update in 15s :-)
Dom Ribaut
@eric: right, and I changed -1 to +1 :)
Bogdan_Ch
and this is why cursors are bad. from 15 minutes to 15 seconds....
KM
15s still seems high a bit high for your record count. What indexes do you have on the table?
Eric
A: 

In case you really need to use cursors, define your cursor with some attributes, like this :

DECLARE _cursor CURSOR LOCAL FAST_FORWARD FOR
....
supudo
this will confilict with the "for Update" (or any update of current)
Dom Ribaut
+1  A: 

Something like, not tested

update
    p
set
    packageNo = p2.Ranking
from
    packages p
    JOIN
    (SELECT
        orderNo, orderlineNo,
        ROW_NUMBER() OVER (PARTITION BY orderNo ORDER BY orderlineNo) AS Ranking
    FROM
        packages) p2 ON p.orderNo = p2.orderNo AND p.orderlineNo= p2.orderlineNo
gbn
Thanks gbn,this works well (15s).it is the same solution that bogdan
Dom Ribaut
A: 
WITH cteA AS (
 SELECT 
  packageNo, 
  row_number() over (partition by orderNo order by orderNo, orderlineNo) rn
 FROM packages
)

UPDATE cteA
SET packageNo = rn - 1;

You should also create a clustered index on orderNo, orderlineNo.

(i hope you are using sql2005 or newer)

Thuglife
Hi>(i hope you are using sql2005 or newer)I actually do for this app :) but we still have apps running on sqlserver200 (and even some sybase 5.something!)--Dom
Dom Ribaut