views:

113

answers:

4

Today at work we got into a discussion about which is the best way to do a query like this :

For instance lets assume a users table :

tblUsers
ID      = Autoint
Name    = String

and a login table :

tblLogin
ID         = AUtoint
UserID    = Int
IP        = String
Browser   = String
OS        = String
timestamp = DateTime

What would be the most efficient way to list all the users and the last time they logged in (if ever), and provide an output like :

user       | ip     | timestamp | browser | os   |
-------------------------------------------------
Some User  |1.1.1.1 | 12/12/12  | userBA  | win  |
Other User |1.1.1.1 | 12/12/12  | userBA  | win  |
And Other  |null    | null      | null    | null |
Other Yet  |1.1.1.1 | 12/12/12  | userBA  | win  |

Keep in mind that what we want here is to show all users exactly once even if he has never logged in, and only the most recent login (i.e max(timestamp)).

Is there a way to do it in one SQL statement?

We are using MSSQL 2005.

Thanks in advance guys, Jim

+2  A: 
SELECT tblUsers.Name, MAX(tblLogin.timestamp)
FROM 
tblUsers LEFT JOIN tblLogin ON tblUsers.ID = tblLogin.UserID
GROUP BY tblUsers.ID
Alex
Shame it doesn't give the desired output...
gbn
Thank you for your input Alex, but what about the other fields?IP, Browser, OS?When you add them to the group by you will get duplicated users, one for each distinct field value in the tblLogin to be precise, is there a way to return only one row for each user (all the users in tblUsers)?
OldJim
+1  A: 
;WITH cLogins AS
(
  SELECT
     L.ip, M.LastSeen, L.browser, L.os
  FROM
      (SELECT UserID, MAX(timestamp) AS LastSeen FROM tblLogin GROUP BY UserID) M
      LEFT JOIN
      tblLogin L ON M.UserID = L.UserID AND M.LastSeen = L.JOIN 
)
SELECT
  I.Name, L.ip, L.LastSeen, L.browser, L.os
FROM 
  tblUsers U
  LEFT JOIN
  cLogins L ON U.UserID = L.UserID
gbn
Nice solution really appreciate your input, thanks.
OldJim
+1  A: 

Im my opinion, the most readable way uses row_number(). You can use it to number rows, starting with 1 for each user, like:

select *
from (
    select u.name, l.ip, l.timestamp, l.browser, l.os,
      row_number() over (partition by u.id order by timestamp desc) rn 
    from tblUsers u
    inner join tblLogin l on u.id = l.userid
) sub
where rn = 1

A filter on rn = 1 gives the latest row per user. A subquery is required because SQL Server 2005 does not allow you to reference a row_number() in a where clause.

The most efficient way to do this depends on the amount of logins per user. You can find a good explanation of some of the more advanced methods in this blog post.

Andomar
The blog post is really helpful thank you for sharing, Jim.
OldJim
+1  A: 

From experience the following query is usually several times faster

select 
    u.name, 
    l1.ip, 
    l1.timestamp, 
    l1.browser, 
    l1.os
from 
    tblUsers u
inner join 
    tblLogin l1 
on 
    u.id = l1.userid
    and l1.Id = ISNULL(
        (select 
            top 1 l2.id 
        from 
            tblLogin l2 
        where 
            u.id = l2.userid 
        order by 
            timestamp desc), 0)

than this query:

select *
from (
    select u.name, l.ip, l.timestamp, l.browser, l.os,
      row_number() over (partition by u.id order by timestamp desc) rn 
    from tblUsers u
    inner join tblLogin l on u.id = l.userid
) sub
where rn = 1

At one time I was particularly interested in this topic as I have a huge ( several million rows ) tables that I needed to process similar way. So I set up a test doing this both ways and the faster query ran about 20 seconds, while the slower one ran about 3 minutes 15 seconds. (This was on SQL 2005). Your set up of course could be different and this also depends on indices, but if performance is critical for you I would test it both ways and choose one that is performs better.

Usual disclaimer: I didn't actually run the query above, it is there to illustrate the idea, a few syntax errors are possible.

zespri