views:

426

answers:

4

I'm working on a social network web application, and i got a situation where i need to resend reminder emails to users who haven't activated their emails. The problem is when i investigated the DB i found that many emails are duplicated (there was no validation on the email uniqueness apparently. So what i need to do know is to retrieve these fields from the db the username, email activation code, email, so that i can resend the activation emails, and for the case of the duplicated emails i need to return only one of them (i.e if i have user john with email [email protected] and user john1 with email [email protected] too, i want to retrieve only on of these johns no matter john1 or two) so i thought of tailing the sql query by (Group By Email). The thing is that i can't select other fields that are not in the group by clause. the solution that i have here is one that i don't like; i created a List and every time when i need to send an email to a user i iterate all over the entire list to make sure that this email is not existing, if it's not there, i send to it, and then add the email to the list. Something like the following: if(!EmailIsInList(email)){ SendActivationEmail(email); AddEmailToList(email) } else { DoNotSend); }

Actually i got the problem solved this way, still i don't like my solution. Any ideas?

A: 

hmm ... this will help for your email-group problem: http://stackoverflow.com/questions/541494/selecting-top-column1-with-matching-column2/541825#541825

just use this with ... so you can take the matching activation-code for the proper email address :)

btw .. please format your question proper and use the code-sample icon!

Andreas Niedermair
Thanks, and for the format, sorry, new to stackoverflow :$
Galilyou
Actually my problem is different. I'm retrieving the date from two tables
Galilyou
that won't matter: the WITH clause can also deal with joins, subselects and such stuff. it's more like a special view (= filter) on the big select. you decide the filter-options via the OVER clause
Andreas Niedermair
A: 

If we assume that the same email address could be both activated against usera and not activated against userb then the following query would return you one userid for each email address that has never been activated

SELECT  MAX(userid),
        email
FROM    users AS u1
WHERE   activated = 'False'
AND NOT EXISTS (
        SELECT 1
        FROM   users AS u2
        WHERE  u2.email = u1.email
        AND    u2.activated = 'True'
        )

GROUP BY email

You really do want to make sure the email field is indexed, and if it was indexed with a unique composite key of (email, userid) then this would be an indexed scan and should be pretty quick.

Steve Weet
please read my comment on the question, i described the tables structure there!
Galilyou
+1  A: 

Income testing data:

DECLARE @User TABLE (UserId int, 
UserName varchar(100), Email varchar(40), IsActivated bit)
INSERT INTO @User
SELECT 1, 'John', '[email protected]', 0 UNION
SELECT 2, 'Ann', '[email protected]', 0 UNION
SELECT 3, 'John2', '[email protected]', 1 UNION
SELECT 4, 'Bill', '[email protected]', 0 UNION
SELECT 5, 'Bill', '[email protected]', 0

DECLARE @Email TABLE (EmailId int, 
UserId int, Date datetime, Message varchar(1000))
INSERT INTO @Email
SELECT 1, 1, GETDATE(), '' UNION
SELECT 2, 2, GETDATE(), '' UNION
SELECT 3, 3, GETDATE(), '' UNION
SELECT 4, 4, GETDATE(), '' UNION
SELECT 5, 5, GETDATE(), ''

SELECT * FROM @User
SELECT * FROM @Email

You see, we have [email protected] already activated once, so we don't need him in result set.
Now, implementation with RANK OVER:

SELECT M.UserID, M.UserName, M.Email, 
    M.IsActivated, M.EmailId, M.Date, M.Message 
FROM (
    SELECT RANK() OVER (PARTITION BY U.Email 
     ORDER BY U.IsActivated Desc, U.UserID ASC) AS N, 
     U.UserID, U.UserName, U.Email, U.IsActivated, 
     E.EmailId, E.Date, E.Message
    FROM @User U INNER JOIN @Email E ON U.UserID = E.UserID
)M WHERE M.N = 1 AND M.IsActivated = 0
Max Gontar
there's basically no difference to the solution provided by Bliek in my link :) but provided a no-need-to-think-solution :)
Andreas Niedermair
well, implementation is quite different - no CTE, no RowNumber. BTW, "crossplatform" solution :D
Max Gontar
i meant basically :) i like this "crossplatform" solution!
Andreas Niedermair
A: 

I think you are making a major logic error. Email address is not and never will be unique. Just because two users have the same email address does NOT mean they are the same person! People often share emails, couples might have the same email, small offices sometimes have only one email (This is often true for doctor's offices.) Emails are also reused if someone gives it up. So John Smith who registers in 2007 with [email protected] may not still be active in your system and thus didn't bother to change his email when he went to [email protected]. In the meantime Judy Smith in a differnt state registers [email protected]. You can;t assume an email address willever be unique.

HLGEM
Yeah, i got your point, and it makes a lot of sense and if i was to redesign the DB i would have put it this way. but i inherited this design from my predecessor and i really need to handle this situation.
Galilyou