views:

53

answers:

2

A colleague of mine (I promise it was a colleague!) has left an update running on our main SQL Server since last Thursday (yes that's right folks, we're pushing 100 hours now!). The SQL in question (in one transaction, I might add) is:

update daily_prices  set min_date = (select min(a.date)
   from daily_prices a       
   where a.key = daily_prices.key and       
   a.iid = daily_prices.iid)

(Yeah I know, heinous...)

The total cost in the query plan is coming out as 22186.7, the estimated number of rows to update is around 151 million.

We obviously need to resolve this query one way or another, we realise that if we are to kill the query we're going to generate some brutal rollback, but we've got no way of knowing how far it has gotten. The only thing we do know is this entry from sys.dm_exec_requests:

session_id  status      query_text              cpu_time    total_elapsed_time  reads       writes      logical_reads
52          suspended   update daily_prices...  2328469     408947075           13831137    42458588    151809497

So my question is, what would be our best course of action?

  1. wait it out
  2. kill it and roll back, and hope that it rolls back before the next ice age
  3. something else?
+2  A: 

I personally would want to wait it out unless I though it had no chance of finishing this week, the roll back at this stage could take far longer than the query has to date. If it's a production server, I really wouldn't take option 2 and kill it unless I absolutely had to.

In terms of regaining some control / working system if you have suitable backups, bring online another database restore the backup / tlog backups, but you will not want to restore to beyond when the transaction was started (or it will still have to roll it back.) This at least gives you a system you could continue dev work against, but unlikely to be the ideal situation for a prod system.

If it's a production server, have some kind words with the individual as to the suitability of testing queries and query plans prior to it being executed. I am sure many DBA's can suggest the less polite methods of instruction :)

Andrew
@andrew, thanks....I think you're confirming what I thought, I think waiting it out is probably my preferred option....the person in question is a little itchy to get this killed and restarted using some more transactional logic, but if we're tied up in an even longer rollback cycle then that will be even more painful as there will be literally nothing we can do once we're in that state!I'm hoping the developer (who happens to be the CEO, BTW!) in question will exercise a little more caution before he f5's! =:-)
James B
@andrew, thanks for your response, we ended up killing it because it was interrupting our backups and production operations etc (see my answer)
James B
A: 

So we got fed up with waiting for our transaction to complete, (after a full week on one piece of SQL, who wouldn't?), and as it was interfering with our backup process, we thought killing it was a necessary evil.

The database started to rollback the transaction.

5 days passed.

We noted with some posts elsewhere on the internet that sometimes some magic happened when the database was restarted and the transaction would "go away", although these are generally debunked*, and it makes no sense, we thought we had nothing left to lose so we gave it a go. We knew the database would go into recovery mode, but the database was becoming increasingly sick anyway and unable to run anything but its current rollback work anyway, and we've seen SQL Server misbehave with hogging system resources and not diverting them to where it needs to do the work.

(* we also know enough database theory to know that the DB wouldn't just "forget" about a transaction in progress, but we were also seeing stack dumps in the SQL Server error logs which kind of told us that the SQL Server was getting increasingly grumpy at the amount of rollback it was having to undertake)

So we restarted the database.

Sure enough the database went into recovery mode. However, the SQL Server event Log was now giving us an update every 20 seconds or so as to how long it was going to take (in all, it reckoned about 25 hours from the log messages, but it ended up being just an hour and a half (!)).

Whether this method of recovery/rollback is faster, I would strongly doubt (as I expect SQL Server had to do the same level of work to unwind the transaction as before), however it did finish within an hour and a half, either way, I don't want to make a habit of restarting my production database when it is halfway through a rollback). The update messages in the event log were an absolute godsend, as anyone who has written a batch program will tell you; however inaccurate they turned out to be - at least they were a worst case.

As we had the luxury of being the only two people using this production box, choosing to send the database into recovery mode worked for us, and gave us informational messages we didn't have access to with just our previous rollback state (or at least nothing we could interpret given our lacking DBA skills). Would I recommend doing this in future? ....Absolutely not, however, hopefully the concerned parties have learnt their lesson, and we can ask the board for some money for a proper development server! (epic Joel-Test fail!)

James B