ansaurus

Question

How to limit the return of a join to just one table?

Answer 1

+2 A:

SELECT tblUsers.Name, MAX(tblLogin.timestamp)
FROM 
tblUsers LEFT JOIN tblLogin ON tblUsers.ID = tblLogin.UserID
GROUP BY tblUsers.ID

Alex 2010-02-25 22:32:03

Shame it doesn't give the desired output...

gbn 2010-02-25 22:36:53

Thank you for your input Alex, but what about the other fields?IP, Browser, OS?When you add them to the group by you will get duplicated users, one for each distinct field value in the tblLogin to be precise, is there a way to return only one row for each user (all the users in tblUsers)?

OldJim 2010-02-25 22:40:16

Answer 2

+1 A:

;WITH cLogins AS
(
  SELECT
     L.ip, M.LastSeen, L.browser, L.os
  FROM
      (SELECT UserID, MAX(timestamp) AS LastSeen FROM tblLogin GROUP BY UserID) M
      LEFT JOIN
      tblLogin L ON M.UserID = L.UserID AND M.LastSeen = L.JOIN 
)
SELECT
  I.Name, L.ip, L.LastSeen, L.browser, L.os
FROM 
  tblUsers U
  LEFT JOIN
  cLogins L ON U.UserID = L.UserID

gbn 2010-02-25 22:36:23

Nice solution really appreciate your input, thanks.

OldJim 2010-02-26 16:56:39

Answer 3

+1 A:

Im my opinion, the most readable way uses row_number(). You can use it to number rows, starting with 1 for each user, like:

select *
from (
    select u.name, l.ip, l.timestamp, l.browser, l.os,
      row_number() over (partition by u.id order by timestamp desc) rn 
    from tblUsers u
    inner join tblLogin l on u.id = l.userid
) sub
where rn = 1

A filter on rn = 1 gives the latest row per user. A subquery is required because SQL Server 2005 does not allow you to reference a row_number() in a where clause.

The most efficient way to do this depends on the amount of logins per user. You can find a good explanation of some of the more advanced methods in this blog post.

Andomar 2010-02-25 23:10:37

The blog post is really helpful thank you for sharing, Jim.

OldJim 2010-02-26 16:54:44

Answer 4

+1 A:

From experience the following query is usually several times faster

select 
    u.name, 
    l1.ip, 
    l1.timestamp, 
    l1.browser, 
    l1.os
from 
    tblUsers u
inner join 
    tblLogin l1 
on 
    u.id = l1.userid
    and l1.Id = ISNULL(
        (select 
            top 1 l2.id 
        from 
            tblLogin l2 
        where 
            u.id = l2.userid 
        order by 
            timestamp desc), 0)

than this query:

select *
from (
    select u.name, l.ip, l.timestamp, l.browser, l.os,
      row_number() over (partition by u.id order by timestamp desc) rn 
    from tblUsers u
    inner join tblLogin l on u.id = l.userid
) sub
where rn = 1

At one time I was particularly interested in this topic as I have a huge ( several million rows ) tables that I needed to process similar way. So I set up a test doing this both ways and the faster query ran about 20 seconds, while the slower one ran about 3 minutes 15 seconds. (This was on SQL 2005). Your set up of course could be different and this also depends on indices, but if performance is critical for you I would test it both ways and choose one that is performs better.

Usual disclaimer: I didn't actually run the query above, it is there to illustrate the idea, a few syntax errors are possible.

zespri 2010-03-02 11:00:45

ansaurus

tags:

views:

answers:

How to limit the return of a join to just one table?

related questions