I have a set of rows that contain duplicate entries because the data originates from multiples sources. I also have a separate reference table that indicates the priority of those data sources.
Does anyone have good tips for the most effective t-SQL to deduplicate this list?
Basically I have:
SELECT a.*, b.priority
FROM tableA as a
JOIN tableB as b
ON b.sourceId = a.sourceId
I have been placing this into a temp table and then deleting in an odd way that I suppose could be more efficient.
DELETE ta
FROM #tmp ta
JOIN #tmp tb
ON ta.duplicateId = tb.duplicateId
WHERE ta.priority < tb.priority
Thanks for the help!
I think I see where you are going... Table A has the same columns for the two sources, but the data can differ - so they may have different prices. The challenge is that I must take the price (and all other info) from the row that comes from the source with the highest priority. To complicate matters, I do not have data for every item from ALL sources.
So, item 1 may have data from source A and B, while item 2 may only have it from source B and C. Thus, the deletion needs to happen on a per-unique item basis.