views:

30

answers:

4

Hello, I am trying to find duplicate rows in my DB, like this:

SELECT email, COUNT(emailid) AS NumOccurrences FROM users GROUP BY emailid HAVING ( COUNT(emailid) > 1 )

This is ok, it returns me the emailid and the found matches. Now what I wanna do is to compare the ID column to another table I have and set a column there with the count.

The other table has a column duplicates which gets the amount of duplicates from the select. So lets say we have 3 rows with the same emailid, the duplicates column has a "3" in all 3 rows. What I want is a "2" in the first 2 and nothing or 0 in the last of the 3 matching ID rows.

Possible?

Thanks

A: 
SELECT ...
       ROW_NUMBER() OVER (PARTITION BY email ORDER BY emailid DESC) AS RN
FROM ...

...is a great starting point for such a problem. Never underestimate the power of ROW_NUMBER()!

Will A
I dont see how this might help in my case, can u point me some more :-)?
grady
A: 

In the end I need 2 of the 3 rows to have the duplicate flag (duplicate column) set, the third should not have it set.

Thanks

grady
A: 

Using Sql Server 2005+ you could try something like (full example)

DECLARE @Table TABLE(
        ID INT IDENTITY(1,1),
        Email VARCHAR(20)
)

INSERT INTO @Table (Email) SELECT 'a'
INSERT INTO @Table (Email) SELECT 'b'
INSERT INTO @Table (Email) SELECT 'c'
INSERT INTO @Table (Email) SELECT 'a'
INSERT INTO @Table (Email) SELECT 'b'
INSERT INTO @Table (Email) SELECT 'a'

; WITH Duplicates AS (
        SELECT  Email,
                COUNT(ID) TotalDuplicates
        FROM    @Table
        GROUP BY    Email
        HAVING  COUNT(ID) > 1
)
, Counts AS (
        SELECT  t.ID,
                ROW_NUMBER() OVER(PARTITION BY t.Email ORDER BY t.ID) EmailID,
                d.TotalDuplicates
        FROM    @Table t INNER JOIN
                Duplicates d    ON  t.Email = d.Email
)
SELECT  ID,
        CASE
            WHEN EmailID = TotalDuplicates
                THEN 0
            ELSE TotalDuplicates - 1
        END Dups
FROM    Counts
astander
A: 

I managed to have a temporary table now, which looks like this:

mailid | rowcount | AmountOfDups
643921 | 1 | 3
643921 | 2 | 3
643921 | 3 | 3

Now, how could I decide that only the first 2 should be updated (by mailid) in the other table? The other table has mailid as well.

grady