A query that is used to loop through 17 millions records to remove duplicates has been running now for about 16 hours and I wanted to know if the query is stopped right now if it will finalize the delete statements or if it has been deleting while running this query? Indeed, if I do stop it, does it finalize the deletes or rolls back?
I have found that when I do a
select count(*) from myTable
That the rows that it returns (while doing this query) is about 5 less than what the starting row count was. Obviously the server resources are extremely poor, so does that mean that this process has taken 16 hours to find 5 duplicates (when there are actually thousands), and this could be running for days?
This query took 6 seconds on 2000 rows of test data, and it works great on that set of data, so I figured it would take 15 hours for the complete set.
Any ideas?
Below is the query:
--Declare the looping variable
DECLARE @LoopVar char(10)
DECLARE
--Set private variables that will be used throughout
@long DECIMAL,
@lat DECIMAL,
@phoneNumber char(10),
@businessname varchar(64),
@winner char(10)
SET @LoopVar = (SELECT MIN(RecordID) FROM MyTable)
WHILE @LoopVar is not null
BEGIN
--initialize the private variables (essentially this is a .ctor)
SELECT
@long = null,
@lat = null,
@businessname = null,
@phoneNumber = null,
@winner = null
-- load data from the row declared when setting @LoopVar
SELECT
@long = longitude,
@lat = latitude,
@businessname = BusinessName,
@phoneNumber = Phone
FROM MyTable
WHERE RecordID = @LoopVar
--find the winning row with that data. The winning row means
SELECT top 1 @Winner = RecordID
FROM MyTable
WHERE @long = longitude
AND @lat = latitude
AND @businessname = BusinessName
AND @phoneNumber = Phone
ORDER BY
CASE WHEN webAddress is not null THEN 1 ELSE 2 END,
CASE WHEN caption1 is not null THEN 1 ELSE 2 END,
CASE WHEN caption2 is not null THEN 1 ELSE 2 END,
RecordID
--delete any losers.
DELETE FROM MyTable
WHERE @long = longitude
AND @lat = latitude
AND @businessname = BusinessName
AND @phoneNumber = Phone
AND @winner != RecordID
-- prep the next loop value to go ahead and perform the next duplicate query.
SET @LoopVar = (SELECT MIN(RecordID)
FROM MyTable
WHERE @LoopVar < RecordID)
END