This has been an adventure. I started with the looping duplicate query located in my previous question, but each loop would go over all 17 million records...meaning it would take weeks (just running *select count * from MyTable* takes my server 4:30 minutes => mssql 2005). I gleamed information from this site and at this post:
http://weblogs.sqlteam.com/jeffs/archive/2007/03/28/60146.aspx
And have arrived at the query below. The question is, is this the correct type of query to run on 17 million records for any type of performance? If it isn't, what is?
SQL QUERY:
DELETE tl_acxiomimport.dbo.tblacxiomlistings
WHERE RecordID in
(SELECT RecordID
FROM tl_acxiomimport.dbo.tblacxiomlistings
EXCEPT
SELECT RecordID
FROM (
SELECT RecordID, Rank() over (Partition BY BusinessName, latitude, longitude, Phone ORDER BY webaddress DESC, caption1 DESC, caption2 DESC ) AS Rank
FROM tl_acxiomimport.dbo.tblacxiomlistings
) al WHERE Rank = 1)