views:

530

answers:

5

I'm working on a procedure that will update a large number of items on a remote server, using records from a local database. Here's the pseudocode.

CREATE PROCEDURE UpdateRemoteServer
    pre-processing
    get cursor with ID's of records to be updated
    while on cursor
        process the item

No matter how much we optimize it, the routine is going to take a while, so we don't want the whole thing to be processed as a single transaction. The items are flagged after being processed, so it should be possible to pick up where we left off if the process is interrupted.

Wrapping the contents of the loop ("process the item") in a begin/commit tran does not do the trick... it seems that the whole statement

EXEC UpdateRemoteServer

is treated as a single transaction. How can I make each item process as a complete, separate transaction?

Note that I would love to run these as "non-transacted updates", but that option is only available (so far as I know) in 2008.

+1  A: 

Are yo running this only from within sql server, or from an app? if so, get the list to be processed, then loop in the app to only process for the subsets as required.

Then the transaction should be handled by your app, and should only lock the items being updated/pages the items are in.

astander
Unfortunately, this is being done solely from within SQL Server, and that requirement is beyond my control.
harpo
Is it hen still treated as a sigle transaction if you were to not execute from a sp but rather just plain tsql? avaoid the sp, and use the inners, with the transactions bound to an tsql session?
astander
We think alike... the problem is that the process needs to be automated, so it has to be executable from a single command.
harpo
Next question. Automated how? using?
astander
It will have to be called either from .NET or launched by the SQL admin. For the latter case, a script as you described would work. I suppose that .NET could execute the "plain tsql" by loading the script file... but wouldn't it then be treated as one transaction?
harpo
If it is executed using .Net, thrn yes, by all means you can split up the batches. Get a list of items to execute for, loop sets of these items using the transation in the loop in .Net. Then you can remove the transaction from the sp if required. My thought is that the transaction is being "nested" if you execute them all at once
astander
Yes, I agree. I'm going to have to go with something like what you describe.
harpo
A: 

NEVER process one item at a time in a loop when you are doing transactional work. You can loop through records processing groups of them but never ever do one record at a time. Do set-based inserts instead and your performance will change from hours to minutes or even seconds. If you are using a cursor to insert update or delete and it isn't handling at least 1000 rowa in each statement (not one at atime) you are doing the wrong thing. Cursors are an extremely poor practice for such thing.

HLGEM
Of course I would use set-based inserts if I could. I am processing a single item at a time because each item has a number of associated child tables that need to be updated. Therefore each iteration has to process dozens of records, and cannot practically be rewritten to deal with the whole update at once. I have so far resisted a policy of downvoting this very common kind of non-answer (i.e., "your whole premise is wrong, you shouldn't be doing this"), but I wonder that their authors do not instead place them in comments.
harpo
+1  A: 

EXEC procedure does not create a transaction. A very simple test will show this:

create procedure usp_foo
as
begin
  select @@trancount;
end
go

exec usp_foo;

The @@trancount inside usp_foo is 0, so the EXEC statement does not start an implicit transaction. If you have a transaction started when entering UpdateRemoteServer it means somebody started that transaction, I can't say who.

That being said, using remote servers and DTC to update items is going to perform quite bad. Is the other server also SQL Server 2005 at least? Maybe you can queue the requests to update and use messaging between the local and remote server and have the remote server perform the updates based on the info from the message. It would perform significantly better because both servers only have to deal with local transactions, and you get much better availability due to the loose coupling of queued messaging.

Updated

Cursors actually don't start transactions. The typical cursor based batch processing is usually based on cursors and batches updates into transactions of a certain size. This is fairly common for overnight jobs, as it allows for better performance (log flush throughput due to larger transaction size) and jobs can be interrupted and resumed w/o losing everithing. A simplified version of a batch processing loop is typically like this:

create procedure usp_UpdateRemoteServer
as
begin
  declare @id int, @batch int;
  set nocount on;
  set @batch = 0;

  declare crsFoo cursor 
    forward_only static read_only 
    for 
    select object_id 
    from sys.objects;

  open crsFoo;

  begin transaction
  fetch next from crsFoo into @id ;
  while @@fetch_status = 0
  begin

    -- process here

    declare @transactionId int;
    SELECT @transactionId = transaction_id 
      FROM sys.dm_tran_current_transaction;
    print @transactionId;

    set @batch = @batch + 1
    if @batch > 10
    begin
      commit;
      print @@trancount;
      set @batch = 0;
      begin transaction;
    end
    fetch next from crsFoo into @id ;
  end
  commit;
  close crsFoo;

  deallocate crsFoo;
end
go

exec usp_UpdateRemoteServer;

I ommitted the error handling part (begin try/begin catch) and the fancy @@fetch_status checks (static cursors actually don't need them anyway). This demo code shows that during the run there are several different transactions started (different transaction IDs). Many times batches also deploy transaction savepoints at each item processed so they can skip safely an item that causes an exception, using a pattern similar to the one in my link, but this does not apply to distributed transactions since savepoints and DTC don't mix.

Remus Rusanu
You're quite right about the performance... I wish I had known that when we chose this technique. Nevertheless, we are not at a place right now where we can rewrite the whole thing, so solving this problem will make the current routine acceptable for the moment.
harpo
Just ran your test proc, and I have to agree; cursors apparantly do not maintain a transaction throughout. I even removed the options specified to get a dynamic updateable cursor, and the transactions still changed. I will rescind my answer (if not my position on cursor usage).
Stuart Ainsworth
Remus, correct me if I'm wrong here but once you start modifying data between servers (via a linked server) doesn't SQL Server automatically start an implicit distributed transaction so that if the change to the remote system fails the failure can be properly sent back to the calling SQL Server?
mrdenny
@MrDenny: Remote updates require distributed transactions and will elevate any existing transaction, including the implicit transaction started by any DML, to a distributed transaction. I'm not sure I follow you, why does it make a difference? The point I was trying to make in my post is that there are only two ways to start a long transaction (that last past the starting statement duration): 1. explicit BEGIN TRANSACTION or 2. having SET IMPLICIT_TRANSACTIONS ON (the later pretty much changes *everything* about my post).
Remus Rusanu
+1  A: 

EDIT: as pointed out by Remus below, cursors do NOT open a transaction by default; thus, this is not the answer to the question posed by the OP. I still think there are better options than a cursor, but that doesn't answer the question.

Stu

ORIGINAL ANSWER:

The specific symptom you describe is due to the fact that a cursor opens a transaction by default, therefore no matter how you work it, you're gonna have a long-running transaction as long as you are using a cursor (unless you avoid locks altogether, which is another bad idea).

As others are pointing out, cursors SUCK. You don't need them for 99.9999% of the time.

You really have two options if you want to do this at the database level with SQL Server:

  1. Use SSIS to perform your operation; very fast, but may not be available to you in your particular flavor of SQL Server.

  2. Because you're dealing with remote servers, and you're worried about connectivity, you may have to use a looping mechanism, so use WHILE instead and commit batches at a time. Although WHILE has many of the same issues as a cursor (looping still sucks in SQL), you avoid creating the outer transaction.

Stu

Stuart Ainsworth
Thanks for the info. This does actually answer the question. Apart from that, I know that cursors suck. In this case it is not the cursor causing the performance problem *per se*, but the work being done inside the loop. But that's a separate question.
harpo
I don't agree with "cursors suck", but that's a different discussion.
IronGoofy
@IronGoofy: you're right; cursors are a specialized tool that should be used in a specialized circumstance (kind of like a screw extractor; you don't need it every day. If you do, you're doing something wrong). Saying "they suck" is easier. :)
Stuart Ainsworth
Cursors don't open transaction by default. Like every other DML statement, the cursor's SELECT must run under a transaction and implicitly opens one, but that transaction does *not* span the whole lifetime of the cursor. There will be such a short transaction at each fetch, but it will last *only* the duraction of the FETCH.
Remus Rusanu
A: 

Just an idea ..

  • Only process a few items when the procedure is called (e.g. only get the TOP 10 items to process)
  • Process those

Hopefully, this will be the end of the transaction.

Then write a wrapper that calls the procedure as long as there is more work to do (either use a simple count(..) to see if there are items or have the procedure return true indicating that there is more work to do.

Don't know if this works, but maybe the idea is helpful.

IronGoofy