views:

117

answers:

4

I have a faily simple many to many schema, which looks like this:alt text

What I'm trying to do is select all players with some arbitary conditions and I also want to select their most recent match, if they've played in one.

What I've managed to do is:

SELECT tblPlayer.PlayerId, tblPlayer.Surname, tblPlayer.Forename,
(SELECT TOP 1 tblMatch.HomeClub + ' v ' + tblMatch.OpponentClub + ' ' + tblMatch.AgeGroup + ' (' + CONVERT(VARCHAR, tblMatch.MatchDateTime, 103) + ')'
 FROM tblAppearance
 INNER JOIN tblMatch ON tblAppearance.MatchID = tblMatch.MatchID
 WHERE tblAppearance.PlayerID = tblPlayer.PlayerID 
 ORDER BY MatchDateTime DESC) AS Match
FROM tblPlayer
LEFT JOIN tblAppearance ON tblAppearance.PlayerId = tblPlayer.PlayerId
LEFT JOIN tblMatch ON tblMatch.MatchId = tblAppearance.MatchId
WHERE tblPlayer.Forename LIKE '%rob%' AND tblPlayer.Surname LIKE '%white%'
ORDER BY tblPlayer.Surname, tblPlayer.Forename, tblPlayer.DOB, tblMatch.MatchDateTime DESC

The problem is that this selects all the matches that a player has been in, not just their most recent one. I know this should be simple, but I can't seem to get the right syntax for it.

Also, I'd rather return the separate columns from the match table as separate columns and not as a formatted lump.

Answers to requests for more info:

Yes there is an MatchDateTime column, which I intend to use for sorting.

Yes I do want players who haven't yet played any matches, the left join is deliberate.

+2  A: 

If i understand correctly you can try this

DECLARE @User TABLE(
     UserID INT
)

DECLARE @Matches TABLE(
     MatchID INT
)

DECLARE @UserMatches TABLE(
     UserMatchID INT,
     UserID INT,
     MatchID INT
)

INSERT INTO @User SELECt 1
INSERT INTO @User SELECt 2

INSERT INTO @Matches SELECt 1
INSERT INTO @Matches SELECt 2

INSERT INTO @UserMatches SELECt 1, 1, 1
INSERT INTO @UserMatches SELECt 2, 1, 2
INSERT INTO @UserMatches SELECt 3, 2, 2

SELECT  u.*,
     m.*
FROm    @UserMatches um INNER JOIN
     (
      SELECT UserID,
        MAX(UserMatchID) MaxID
      FROM @UserMatches um
      GROUP BY UserID
     ) MaxIdsPerUser ON um.UserID = MaxIdsPerUser.UserID
         AND um.UserMatchID = MaxID INNER JOIN 
     @User u ON um.UserID = u.UserID INNER JOIN
     @Matches m ON um.MatchID = m.MatchID

It is possible that if you have a DateTime to determine the most recent match, you can use that for the max.

astander
ah you beat me to it :) I was going to go with the derived table join too
Scott Anderson
This assumes a `UserMatchID` that is strictly monotonous. Might not be the case.
Tomalak
I asked the OP to clarify whether UserMatchID (AppearanceID in the OP's schema) is given out in the same order as MatchDateTime. FWIW, there are many reasons why this assumption may not hold. For example, if matches are variable-length and records are only added at the end of the match, then a match that starts later may end earlier than another which started later. Similarly, if matches are scheduled in advance of the match, the assumption may not hold either. Funny, in my answer I assumed the conservative assumption-- you assumed the best-performing one. I like your attitude better! :-)
Justin Grant
@astander: Also, if I'm understanding the OP's query correctly, his LEFT JOIN implies that rows are also expected to be returned for players who have played no matches yet. So (if my understanding is correct) you'd need a UNION and a join for your query to be accurate since you'll be missing all rows for users with no matches. I asked for clarification on this one too.
Justin Grant
... never mind about the UNION, you could also use an OUTER JOIN here, e.g. ON (um.UserID = u.UserID OR um.UserID IS NULL)
Justin Grant
A: 

If the MatchID is an incremental number, you could base the most recent match for each player being the MAX number

kevchadders
+1  A: 

Instead of using tblAppearance in join, use a derived table that selects just the last appearance of the player, using APPLY, and join on this derived table:

SELECT ...
FROM tblPlayer
OUTER APPLY (
  SELECT TOP (1) *
  FROM tblAppearance
  WHERE tblPlayer.PlayerId = tblAppearance.PlayerId
  ORDER BY MatchDateTime DESC) AS lastAppearance
LEFT JOIN tblMatch ON tblMatch.MatchId = lastAppearance.MatchId
WHERE ...
Remus Rusanu
FWIW, my experience with APPLY queries has not been great performance-wise. SQL seems to often select worse query plans for APPLY solutions vs. equivalent solutions (see my answer for an example) using ROW_NUMBER() or subqueries. So I usually avoid APPLY, unless the table is small or performance is not a big deal. That said, everyone's mileage may vary-- but the next time you're tempted to use APPLY, try the same thing using ROW_NUMBER() and see which one finishes faster. :-)
Justin Grant
@Justin: APPLY is by definition a nested loop, ordinary joins can use hash or merge operators. Also APPLY has no parallelism potential. So yes, it may be outperformed in some situations. There are also scenarios when APPLY is the *only* solution (XML nodes, table value functions). In these specific situation (first row in a group) is in interesting comparison though, not sure which one would win in a head to head `APPLY (TOP 1 ORDER BY...)` vs. `JOIN (ROW_NUMBER() OVER (PARTITION BY...)`
Remus Rusanu
I agree 100%, APPLY is hugely useful because there's many cases where there's no other practical option. But, in cases where there's is another option, I've yet to find a case (so far at least) where APPLY was the fastest query plan. I remembered the join-type limitation, but I'd totally forgotten about the loss of parallelism too. I once tried APPLY on a billion-row query on a 8-proc monster server... and ended up waiting a long time for my answer. :-)
Justin Grant
+2  A: 

For these kinds of problems (get top row in a group) I've had the most success, in both performance and maintainability, using ROW_NUMBER() and a CTE. The pattern is simple: the CTE selects the columns you want, and adds an additional column for the ROW_NUMBER() within each group (ordered by your desired order, of course). Then the post-CTE part of the query restricts results to those where the ROW_NUMBER() is 1.

Like this:

WITH cte AS
(
    SELECT tblPlayer.PlayerId, Surname, Forename, HomeClub, OpponentClub, AgeGroup, MatchDateTime, DOB,
     ROW_NUMBER () OVER (PARTITION BY tblPlayer.PlayerId ORDER BY MatchDateTime DESC) AS RowNum
    FROM tblPlayer
     LEFT JOIN tblAppearance ON tblAppearance.PlayerId = tblPlayer.PlayerId
     LEFT JOIN tblMatch ON tblMatch.MatchId = tblAppearance.MatchId
    WHERE Forename LIKE '%rob%' AND Surname LIKE '%white%'
)
SELECT PlayerId, Surname, Forename, HomeClub, OpponentClub, AgeGroup, MatchDateTime, DOB
FROM cte
WHERE RowNum = 1
ORDER BY Surname, Forename, DOB, MatchDateTime

Note that I'm not assuming that any IDs are ordered in the same way that the MatchDateTime's are-- there are lots of reasons (e.g. advance scheduling) why that assumption may not hold. If, however, appearance IDs are ordered identically to dates, then the query above can be made much more efficient because you won't have to do any joins to find the MatchID's you're looking for.

Note that if you have a very large number of players (100,000+) and you run this query often, there are optimizations you'll want to do here, since every time you run this query you'll do a table scan of the Player table to support your LIKE filter. If this is the case, you'll probably want to creat a covering index on Surname, Forename and get SQL to run your query in stages: first filter Player records using the covering index, then do your join, and finally pull out other Player columns from the clustered index. It's usually hard to get SQL to perform plans like this (you may need a temp table for the intermediate results), but for very large tables the perf win is worth it. If you have a small number of players, ignore the preceding paragraph. :-)

Justin Grant