views:

1314

answers:

3

Guys, I have a query where basically select the latest browser that our user used.

here is our (simplified) table structure

HITS_TABLE
----------
USERID
BROWSER
HITSDATE

USER_TABLE
----------
USERID
USERNAME

and here is how I query the latest browser that our user used

SELECT U.*, H.BROWSER

FROM USER_TABLE U

CROSS APPLY 
  (SELECT TOP 1 BROWSER 
   FROM HITS_TABLE 
   WHERE HITS_TABLE.USERID = U.USERID
   ORDER BY HITS_TABLE.HITSDATE DESC
  )as H

The HITS_TABLE is just added several days ago.

So, that query is just resulting users that visited our website after we added the HITS_TABLE, and eliminate the others.

Here is the sample case

USER_TABLE
-------------------
USERID     USERNAME
-------------------
1          'Spolski'
2          'Atwoord
3          'Dixon'


HITS_TABLE
------------------------------
USERID     HITSDATE     BROWSER
------------------------------
2          15/8/2009    'Firefox 3.5'
1          16/8/2009    'IE 6'
2          16/8/2009    'Chrome'

Here is the sample result

------------------------------
USERID     USERNAME     BROWSER
------------------------------
1          'Spolsky'    'IE 6'
2          'Atwoord'    'Chrome'

But, I want to add other users with 'unknown' browser. Here is my desired result

------------------------------
USERID     USERNAME     BROWSER
------------------------------
1          'Spolsky'    'IE 6'
2          'Atwoord'    'Chrome'
3          'Dixon'      'Unknown'

I believe it could be achieved by LEFT OUTER JOIN. But I always had this: (I DO NOT want this result)

------------------------------
USERID     USERNAME     BROWSER
------------------------------
1          'Spolsky'    'IE 6'
2          'Atwoord'    'Chrome'
2          'Atwoord'    'Firefox 3.5'
3          'Dixon'      'Unknown'

I hope my question is clear.

+1  A: 

Couldn't you sub select, not pretty but should work ..

SELECT U.*,

ISNULL((SELECT TOP 1 BROWSER 
   FROM HITS_TABLE 
   WHERE HITS_TABLE.USERID = U.USERID
   ORDER BY HITS_TABLE.HITSDATE DESC),'UnKnown') AS Browser

FROM USER_TABLE U
Matthew Pelser
If you want to access any other column besides browser from the hits table in this query then the sub select is not for you. In that case i would profile @rwarren and @gbn solutions to see which performs better. @Mao has an interesting point about non-deterministic results. With your pragmatic hat on you could probably ignore this edge case by adding Time to you HITSDATE.
Matthew Pelser
A: 
SELECT U.*,'BROWSER' = 
    case 
     when (SELECT TOP 1 BROWSER FROM HITS_TABLE WHERE HITS_TABLE.USERID = U.USERID ORDER BY HITS_TABLE.HITSDATE DESC) is  null then 'Unknown'
else (SELECT TOP 1 BROWSER FROM HITS_TABLE WHERE HITS_TABLE.USERID = U.USERID ORDER BY HITS_TABLE.HITSDATE DESC)
    end
FROM USER_TABLE U
Wael Dalloul
In your solution wouldn't the sub select execute twice when the result is not null? Firstly to evaluate the "when" and find out the browser is not null and Secondly in the "else" to extract the result?
Matthew Pelser
+2  A: 

using a group by on userid against the hits_table allows you to get the max() hitsdate for each userid. I've called this LATEST HITS in the code below.

Selecting on the USER TABLE with a left join to LATEST HITS allows you to pull records for every user.

joining back onto the HITS TABLE then allwos you to pull the browser record associated with that date, or a null for users with no record in there.

select
   user_table.userid,
   user_table.username,
   isnull(hitstable.browser, 'unknown') as browser
from
  user_table
left join
(
  select
    userid,
    max(hitsdate) hitsdate
  from
    hits_table
  group by  
    userid
) latest_hits
on
  user_table.userid = latest_hits.userid    
left join
  hits_table
on hits.table.userid = latest_hits.userid
and hits_table.hitsdate = latest_hits.hitsdate
Robin
This solution takes one important fact into account, that the others are missing: What if the combination of USERID and HITSDATE is ambiguous, e.g. an additional row (2, 16/8/2009, 'Safari') exists? Using ranking functions you would get a non-deterministic result. Can you tell which one is selected? This solution would provide both cominations which is IMHO much better.
The Chairman
Additional Info: For information on SQL Server Ranking see http://msdn.microsoft.com/en-us/library/ms189798%28SQL.90%29.aspx
The Chairman
you're right. max() function is very useful for this. thank you. but I think it should be left outer join.
Anwar Chandra