tags:

views:

618

answers:

4

What I have is basically a problem which is easily solved with multiple tables, but I have only a single table to do it.

Consider the following database table

UserID UserName EmailAddress         Source
3K3S9  Ben      [email protected]        user
SF13F  Harry    [email protected] 3rd_party
SF13F  Harry    [email protected]    user
76DSA  Lisa     [email protected]     user
OL39F  Nick     [email protected]   3rd_party
8F66S  Stan     [email protected]        user

I need to select all fields, but only who each user once along with one of their email addresses (the "biggest" one as determined by the MAX() function). This is the result I am after ...

UserID UserName EmailAddress         Source
3K3S9  Ben      [email protected]        user
SF13F  Harry    [email protected] 3rd_party
76DSA  Lisa     [email protected]     user
OL39F  Nick     [email protected]   3rd_party
8F66S  Stan     [email protected]        user

As you can see, "Harry" is only shown once with his "highest" email address the correcponding "source"

Currently what is happening is that we are grouping on the UserID, UserName, and using MAX() for the EmailAddress and Source, but the max of those two fields dont always match up, they need to be from the same record.

I have tried another process by joining the table with itself, but I have only managed to get the correct email address but not the corresponding "source" for that address.

Any help would be appreciated as I have spent way too long trying to solve this already :)

+5  A: 
select distinct * from table t1
where EmailAddress = 
(select max(EmailAddress) from table t2
where t1.userId = t2.userId)
tekBlues
It's worth noting that this often can perform faster than the accepted answer, especially if there is an index on {userid, EmailAddress DESC} on t2
Dave Markle
+2  A: 

If you're on SQL Server 2005 or higher,

SELECT  UserID, UserName, EmailAddress, Source
FROM    (SELECT UserID, UserName, EmailAddress, Source,
                ROW_NUMBER() OVER (PARTITION BY UserID
                                   ORDER BY EmailAddress DESC) 
                    AS RowNumber
         FROM   MyTable) AS a
WHERE   a.RowNumber = 1

Of course there are ways to do the same task without the (SQL-Standard) ranking functions such as ROW_NUMBER, which SQL Server implemented only since 2005 -- including nested dependent queries and self left joins with an ON including a '>' and a WHERE ... IS NULL trick -- but the ranking functions make for code that's readable and (in theory) well optimizable by the SQL Server Engine.

Edit: this article is a nice tutorial on ranking, but it uses RANK in the examples instead of ROW_NUMBER (or the other ranking function, DENSE_RANK) -- the distinction matters when there are "ties" among grouped rows in the same partition according to the ordering criteria. this post does a good job explaining the difference.

Alex Martelli
Very interesting Alex, I will study about this features.
tekBlues
This certainly works very well ... yet I don't understand the syntax >.< I will need to do a bit of reading to understand it :)
Nippysaurus
Edited my answer to add URLs to two good, short tutorials on ranking functions -- HTH!
Alex Martelli
Thanks a million mate! You're a legend! :)
Nippysaurus
A: 
select distinct
     *  
from    
   SomeTable a
inner join (
  select max(emailAddress), userId
  from
     SomeTable 
  group by 
     userId
) b on a.emailAddress = b.emailAddress and a.userId = b.userId
Jeff Meatball Yang
I'd be happier if the ON condition include a.userID = b.userID as well as the email address.
Jonathan Leffler
True, it makes it more specific and avoids potential problems. I've edited my answer to reflect this.
Jeff Meatball Yang
A: 

I think I have a solution that's different from the ones already proposed:

select *
from foo
where id = (
  select id
  from foo F
  where F.bar = foo.bar
  order by F.baz
  limit 1
)

This gives you all the foo records that have the greatest baz compared to other foo records with the same bar.

allyourcode