views:

243

answers:

5

I have a stored procedure that inserts batches of millions of rows, emerging from a certain query, into an SQL database. It has one parameter selecting the batch; when this parameter is omitted, it will gather a list of batches and recursively call itself, in order to iterate over batches. In (pseudo-)code, it looks something like this:

CREATE PROCEDURE spProcedure AS BEGIN
    IF @code = 0  BEGIN
        ...
        WHILE @@Fetch_Status=0 BEGIN
            EXEC spProcedure @code
            FETCH NEXT ... INTO @code
        END
    END
    ELSE BEGIN

        -- Disable indexes
        ...

        INSERT INTO table
        SELECT (...)

        -- Enable indexes
        ...

Now it can happen that this procedure is slow, for whatever reason: it can't get a lock, one of the indexes it uses is misdefined or disabled. In that case, I want to be able kill the procedure, truncate and recreate the resulting table, and try again. However, when I try and kill the procedure, the process frequently oozes into a KILLED/ROLLBACK state from which there seems to be no return. From Google I have learned to do an sp_lock, find the spid, and then kill it with KILL <spid>. But when I try to kill it, it tells me

SPID 75: transaction rollback in progress. Estimated rollback completion: 0%. Estimated time remaining: 554 seconds.

I did find a forum message hinting that another spid should be killed before the other one can start a rollback. But that didn't work for me either, plus I do not understand, why that would be the case... could it be because I am recursively calling my own stored procedure? (But it should be having the same spid, right?)

In any case, my process is just sitting there, being dead, not responding to kills, and locking the table. This is very frustrating, as I want to go on developing my queries, not waiting hours on my server sitting dead while pretending to be finishing a supposed rollback.

Is there some way in which I can tell the server not to store any rollback information for my query? Or not to allow any other queries to interfere with the rollback, so that it will not take so long? Or how to rewrite my query in a better way, or how kill the process successfully without restarting the server?

+1  A: 

As I understand it, not without potentially killing the consistency of the data file by doing something nasty like a hard reset, and then SQL will go into recovery state and still perform aspects of roll back to make sure the rolled back transactions is sucessfully rolled back.

Unless something like a nolock can get you past your existing lock (you've not mentioned if its exclusive lock) - you could probably still script the schema of the table, makes it MyTable2 - and then keep writing queries and go back and alter them when it's finished.

Andrew
+3  A: 

It's rolling back the transaction.

It will keep doing this even if you restart the instance.

If you were 99 million rows into a 100 million row insert or delete, all 99 million need rolled back. You cannot change this behavior. Any single DML statement is atomic.

If you want to fix it:

  • Stop SQL Server
  • Delete the DB files
  • Start SQL Server
  • DROP the DB in its broken state
  • Restore

YMMV of course :-)

gbn
+1  A: 

When you kill a SQL Server process, it doesn't die immediately, all the work done by that pocesses' active transaction must first be rolled back. The rollback may take a substantial amount of time - maybe as much time as, or even more than, the query had used in execution prior to the kill.

I have also seen a bug occur where a killed/rolling back process remains indefinitely. Fortunately in the case I saw this, no significant locks were being held by that process!

There are steps you could take to avoid this but I don't want to recommend anything like that without btter understanding your requirements and situation since it could adversely affect other proceses/queries.

Daniel Renshaw
+2  A: 

Been here. Last time for us was processing somewhere around 3.2 Billion records. The initial statement went to 99% complete within ten minutes; then spent 20 hours on IO.

The first time through it ran for about 8 hours, then an automated backup job killed the process and bounced the server. It took nearly 2 days to get it back online so that we could start the process over.. This time making sure the backup process was turned off.

Chris Lively
Why does your automated backup job kill spids and bounce the server??
BradC
@BradC: damn good question, for which I don't have a clear answer. The product they use is Snapshot from NetApp. Note to others: don't use this on a high transaction system.
Chris Lively
Older versions of Oracle require downtime for backups--I'd bet you have some crossover DBAs who don't know that isn't necessary on MS SQL server. We have a 3rd party application that doesn't support "hot backups" either, even on MS SQL! We have been arguing with them quite a lot...
BradC
+5  A: 

A few comments.

First, gbn is correct with regard to not being able to cancel rollbacks in progress. This is how SQL keeps transactional integrity, and you wouldn't want that behavior changed. If you absolutely don't care and just want to get your DB back to where you were when the last backup was taken, then follow his steps.

One caveat, however. There are times I've seen where a spid isn't really rolling back, its just stuck (usually at 0% or 100% progress). The most reliable indicator in this case is if the CPU/IO counters for the spid in activity monitor are not changing (and the SPID isn't being blocked by another SPID). In this case, you might have to restart the SQL service (don't need to do an entire reboot) to clear the spid.

With regard to re-organizing your query so that these rollbacks don't cripple you, yes, its possible. Just use explicit transactions:

    WHILE @@Fetch_Status=0 BEGIN
        BEGIN TRANS
            EXEC spProcedure @code
        COMMIT TRANS
        FETCH NEXT ... INTO @code
    END

The data is committed after each batch. If you experience an issue and have to kill the spid, it should only roll back the current batch it is working on.

If even a single batch is too much, you could probably refactor your "spProcedure" to insert in smaller batches of 10k-100k records, committing after each one.

BradC
I'd give you more than one upvote if I could.
HLGEM
@HLGEM: I'll do it for you. ;)
Chris Lively
This is why I love Stack Overflow. Thank you. I will indeed refactor my query, saving me a *lot* of headaches with it in the future.
littlegreen