I was recently tasked with debugging a strange problem within an e-commerce application. After an application upgrade the site started to hang from time to time and I was sent in to debug. After checking the event log I found that the SQL-server wrote ~200 000 events in a couple of minutes with the message saying that a constraint had failed. After much debugging and some tracing I found the culprit. I've removed some unnecessary code and cleaned it up a bit but essentially this is it
WHILE EXISTS (SELECT * FROM ShoppingCartItem WHERE ShoppingCartItem.PurchID = @PurchID)
BEGIN
SELECT TOP 1
@TmpGFSID = ShoppingCartItem.GFSID,
@TmpQuantity = ShoppingCartItem.Quantity,
@TmpShoppingCartItemID = ShoppingCartItem.ShoppingCartItemID,
FROM
ShoppingCartItem INNER JOIN GoodsForSale on ShoppingCartItem.GFSID = GoodsForSale.GFSID
WHERE ShoppingCartItem.PurchID = @PurchID
EXEC @ErrorCode = spGoodsForSale_ReverseReservations @TmpGFSID, @TmpQuantity
IF @ErrorCode <> 0
BEGIN
Goto Cleanup
END
DELETE FROM ShoppingCartItem WHERE ShoppingCartItem.ShoppingCartItemID = @TmpShoppingCartItemID
-- @@ROWCOUNT is 1 after this
END
Facts:
- There's only one or two records matching the first select-clause
- RowCount from the DELETE statement indicates that it has been removed
- The WHILE-clause will loop forever
The procedure has been rewritten to select the rows that should be deleted into a temporary in-memory table instead so the immediate problem is solved but this really sparked my curiosity.
Why does it loop forever?
Clarification: The delete doesn't fail (@@rowcount is 1 after the delete stmt when debugged) Clarification 2: It shouldn't matter whether or not the SELECT TOP ... clause is ordered by any specific field since the record with the returned id will be deleted so in the next loop it should get another record.
Update: After checking the subversion logs I found the culprit commit that made this stored procedure to go haywire. The only real difference that I can find is that there previously was no join in the SELECT TOP 1 statement i.e. without that join it worked without any transaction statements surrounding the delete. It appears to be the introduction of the join that made SQL server more picky.
Update clarification: brien pointed out that there's no need for the join but we actually do use some fields from the GoodsForSale table but I've removed them to keep the code simply so that we can concentrate on the problem at hand