ansaurus

Question

Answer 1

+2 A:

If i understand correctly you can try this

DECLARE @User TABLE(
     UserID INT
)

DECLARE @Matches TABLE(
     MatchID INT
)

DECLARE @UserMatches TABLE(
     UserMatchID INT,
     UserID INT,
     MatchID INT
)

INSERT INTO @User SELECt 1
INSERT INTO @User SELECt 2

INSERT INTO @Matches SELECt 1
INSERT INTO @Matches SELECt 2

INSERT INTO @UserMatches SELECt 1, 1, 1
INSERT INTO @UserMatches SELECt 2, 1, 2
INSERT INTO @UserMatches SELECt 3, 2, 2

SELECT  u.*,
     m.*
FROm    @UserMatches um INNER JOIN
     (
      SELECT UserID,
        MAX(UserMatchID) MaxID
      FROM @UserMatches um
      GROUP BY UserID
     ) MaxIdsPerUser ON um.UserID = MaxIdsPerUser.UserID
         AND um.UserMatchID = MaxID INNER JOIN 
     @User u ON um.UserID = u.UserID INNER JOIN
     @Matches m ON um.MatchID = m.MatchID

It is possible that if you have a DateTime to determine the most recent match, you can use that for the max.

astander 2009-11-19 16:33:21

ah you beat me to it :) I was going to go with the derived table join too

Scott Anderson 2009-11-19 16:35:48

This assumes a `UserMatchID` that is strictly monotonous. Might not be the case.

Tomalak 2009-11-19 17:11:13

I asked the OP to clarify whether UserMatchID (AppearanceID in the OP's schema) is given out in the same order as MatchDateTime. FWIW, there are many reasons why this assumption may not hold. For example, if matches are variable-length and records are only added at the end of the match, then a match that starts later may end earlier than another which started later. Similarly, if matches are scheduled in advance of the match, the assumption may not hold either. Funny, in my answer I assumed the conservative assumption-- you assumed the best-performing one. I like your attitude better! :-)

Justin Grant 2009-11-19 17:41:59

@astander: Also, if I'm understanding the OP's query correctly, his LEFT JOIN implies that rows are also expected to be returned for players who have played no matches yet. So (if my understanding is correct) you'd need a UNION and a join for your query to be accurate since you'll be missing all rows for users with no matches. I asked for clarification on this one too.

Justin Grant 2009-11-19 18:12:02

... never mind about the UNION, you could also use an OUTER JOIN here, e.g. ON (um.UserID = u.UserID OR um.UserID IS NULL)

Justin Grant 2009-11-19 18:18:22

Answer 2

A:

If the MatchID is an incremental number, you could base the most recent match for each player being the MAX number

kevchadders 2009-11-19 16:34:00

Answer 3

+1 A:

Instead of using tblAppearance in join, use a derived table that selects just the last appearance of the player, using APPLY, and join on this derived table:

SELECT ...
FROM tblPlayer
OUTER APPLY (
  SELECT TOP (1) *
  FROM tblAppearance
  WHERE tblPlayer.PlayerId = tblAppearance.PlayerId
  ORDER BY MatchDateTime DESC) AS lastAppearance
LEFT JOIN tblMatch ON tblMatch.MatchId = lastAppearance.MatchId
WHERE ...

Remus Rusanu 2009-11-19 16:36:13

FWIW, my experience with APPLY queries has not been great performance-wise. SQL seems to often select worse query plans for APPLY solutions vs. equivalent solutions (see my answer for an example) using ROW_NUMBER() or subqueries. So I usually avoid APPLY, unless the table is small or performance is not a big deal. That said, everyone's mileage may vary-- but the next time you're tempted to use APPLY, try the same thing using ROW_NUMBER() and see which one finishes faster. :-)

Justin Grant 2009-11-19 17:23:18

@Justin: APPLY is by definition a nested loop, ordinary joins can use hash or merge operators. Also APPLY has no parallelism potential. So yes, it may be outperformed in some situations. There are also scenarios when APPLY is the *only* solution (XML nodes, table value functions). In these specific situation (first row in a group) is in interesting comparison though, not sure which one would win in a head to head `APPLY (TOP 1 ORDER BY...)` vs. `JOIN (ROW_NUMBER() OVER (PARTITION BY...)`

Remus Rusanu 2009-11-19 17:41:09

I agree 100%, APPLY is hugely useful because there's many cases where there's no other practical option. But, in cases where there's is another option, I've yet to find a case (so far at least) where APPLY was the fastest query plan. I remembered the join-type limitation, but I'd totally forgotten about the loss of parallelism too. I once tried APPLY on a billion-row query on a 8-proc monster server... and ended up waiting a long time for my answer. :-)

Justin Grant 2009-11-19 17:55:13

Answer 4

+2 A:

For these kinds of problems (get top row in a group) I've had the most success, in both performance and maintainability, using ROW_NUMBER() and a CTE. The pattern is simple: the CTE selects the columns you want, and adds an additional column for the ROW_NUMBER() within each group (ordered by your desired order, of course). Then the post-CTE part of the query restricts results to those where the ROW_NUMBER() is 1.

Like this:

WITH cte AS
(
    SELECT tblPlayer.PlayerId, Surname, Forename, HomeClub, OpponentClub, AgeGroup, MatchDateTime, DOB,
     ROW_NUMBER () OVER (PARTITION BY tblPlayer.PlayerId ORDER BY MatchDateTime DESC) AS RowNum
    FROM tblPlayer
     LEFT JOIN tblAppearance ON tblAppearance.PlayerId = tblPlayer.PlayerId
     LEFT JOIN tblMatch ON tblMatch.MatchId = tblAppearance.MatchId
    WHERE Forename LIKE '%rob%' AND Surname LIKE '%white%'
)
SELECT PlayerId, Surname, Forename, HomeClub, OpponentClub, AgeGroup, MatchDateTime, DOB
FROM cte
WHERE RowNum = 1
ORDER BY Surname, Forename, DOB, MatchDateTime

Note that I'm not assuming that any IDs are ordered in the same way that the MatchDateTime's are-- there are lots of reasons (e.g. advance scheduling) why that assumption may not hold. If, however, appearance IDs are ordered identically to dates, then the query above can be made much more efficient because you won't have to do any joins to find the MatchID's you're looking for.

Note that if you have a very large number of players (100,000+) and you run this query often, there are optimizations you'll want to do here, since every time you run this query you'll do a table scan of the Player table to support your LIKE filter. If this is the case, you'll probably want to creat a covering index on Surname, Forename and get SQL to run your query in stages: first filter Player records using the covering index, then do your join, and finally pull out other Player columns from the clustered index. It's usually hard to get SQL to perform plans like this (you may need a temp table for the intermediate results), but for very large tables the perf win is worth it. If you have a small number of players, ignore the preceding paragraph. :-)

Justin Grant 2009-11-19 17:02:41

ansaurus

tags:

views:

answers:

SELECT TOP 1 in a many to many query

related questions