views:

70

answers:

3

How do I Delete duplicated rows in one Table and update References in another table to the remaining row? The duplication only occurs in the name. The Id Columns are Identity columns.

Example:

Assume we have two tables Doubles and Data.

Doubles table (
   Id int,
   Name varchar(50)
)

Data Table (
    Id int,
    DoublesId int
)

Now I Have Two entries in the Doubls table:

Id Name
1  Foo
2  Foo

And two entries in the Data Table:

ID DoublesId
1  1
2  2

At the end there should be only one entry in the Doubles Table:

Id Name
1  Foo

And two entries in the Data Table:

Id DoublesId
1  1
2  1

In the doubles Table there can be any number of duplicated rows per name (up to 30) and also regular 'single' rows.

A: 

Note: I have taken the liberty to rename your Id's to DoubleID and DataID respectively. I find that eassier to work with.

DECLARE @Doubles TABLE (DoubleID INT, Name VARCHAR(50))
DECLARE @Data TABLE (DataID INT, DoubleID INT)

INSERT INTO @Doubles VALUES (1, 'Foo')
INSERT INTO @Doubles VALUES (2, 'Foo')
INSERT INTO @Doubles VALUES (3, 'Bar')
INSERT INTO @Doubles VALUES (4, 'Bar')

INSERT INTO @Data VALUES (1, 1)
INSERT INTO @Data VALUES (1, 2)
INSERT INTO @Data VALUES (1, 3)
INSERT INTO @Data VALUES (1, 4)

SELECT * FROM @Doubles
SELECT * FROM @Data

UPDATE @Data
SET DoubleID = MinDoubleID
FROM    @Data dt
        INNER JOIN @Doubles db ON db.DoubleID = dt.DoubleID
        INNER JOIN (
            SELECT    db.Name, MinDoubleID = MIN(db.DoubleID)
            FROM      @Doubles db
            GROUP BY  db.Name
        ) dbmin ON dbmin.Name = db.Name

/* Kudos to quassnoi */
;WITH q AS (
    SELECT Name, ROW_NUMBER() OVER (PARTITION BY Name ORDER BY Name) AS rn
    FROM @Doubles
    )
DELETE  
FROM    q
WHERE   rn > 1

SELECT * FROM @Doubles
SELECT * FROM @Data
Lieven
I don't understand it myself, but this example code seems to work on SQL Server 2008.
bamccaig
A: 

Take a look at this one, i have tried this, working fine

--create table Doubles  (   Id int,   Name varchar(50))

--create table Data(    Id int,    DoublesId int)

--select * from doubles
--select * from data

Declare @NonDuplicateID int
Declare @NonDuplicateName varchar(max)
DECLARE @sqlQuery nvarchar(max)
DECLARE DeleteDuplicate CURSOR FOR

SELECT Max(id),name AS SingleID FROM Doubles
GROUP BY [NAME]

OPEN DeleteDuplicate 
FETCH NEXT FROM DeleteDuplicate INTO @NonDuplicateID, @NonDuplicateName
--Fetch next record
WHILE @@FETCH_STATUS = 0
BEGIN

--select b.ID , b.DoublesID, a.[name],a.id asdasd
--from doubles a    inner join  data b
--on
--a.ID=b.DoublesID
--where b.DoublesID<>@NonDuplicateID
--and a.[name]=@NonDuplicateName

print '---------------------------------------------';

select 
@sqlQuery = 
      'update b
     set b.DoublesID=' + cast(@NonDuplicateID as varchar(50)) + '
     from 
     doubles a
      inner join
     data b
      on
     a.ID=b.DoublesID
     where b.DoublesID<>' +  cast(@NonDuplicateID as varchar(50)) +
     ' and a.[name]=''' +  cast(@NonDuplicateName as varchar(max)) +'''';

print @sqlQuery
exec sp_executeSQL @sqlQuery
print '---------------------------------------------';

-- now move the cursor
FETCH NEXT FROM DeleteDuplicate INTO @NonDuplicateID ,@NonDuplicateName
END

CLOSE DeleteDuplicate --Close cursor
DEALLOCATE DeleteDuplicate --Deallocate cursor

---- Delete duplicate rows from original table
DELETE
FROM doubles
WHERE ID NOT IN
(
    SELECT MAX(ID)
    FROM doubles
    GROUP BY [NAME]
)

Please try and let me know if this helped you

Thanks

~ Aamod

Aamod
+1  A: 

I've not run this, but hopefully it should be correct, and close enough to the final soln to get you there. Let me know any mistakes if you like and I'll update the answer.

--updates the data table to the min ids for each name
update Data
set id = final_id
from
  Data
join
  Doubles 
on Doubles.id = Data.id
join
(
  select 
    name
    min(id) as final_id
  from Doubles
  group by name
) min_ids
on min_ids.name = Doubles.name

--deletes redundant ids from the Doubles table
delete 
from Doubles
where id not in
(
  select 
    min(id) as final_id
  from Doubles
  group by name
)
Robin
Thanks, worked like a charm!
Thomas Schreiner