views:

426

answers:

4
+2  Q: 

Limited T-SQL Join

This should be simple enough, but somehow my brain stopped working.

I have two related tables:

Table 1:

ID (PK), Value1

Table 2:

BatchID, Table1ID (FK to Table 1 ID), Value2

Example data:

Table 1:

ID  Value1
1   A
2   B

Table 2:

BatchID  Table1ID  Value2
1        1         100
2        1         101
3        1         102
1        2         200
2        2         201

Now, for each record in Table 1, I'd like to do a matching record on Table 2, but only the most recent one (batch ID is sequential). Result for the above example would be:

Table1.ID  Table1.Value1  Table2.Value2
1          A              102
2          B              201

The problem is simple, how to limit join result with Table2. There were similar questions on SO, but can't find anything like mine. Here's one on MySQL that looks similar: http://stackoverflow.com/questions/494974/limiting-an-sql-join

I'm open to any approach, although speed is still the main priority since it will be a big dataset.

A: 

Either GROUP BY or WHERE clause that filters on the most recent:

SELECT * FROM Table1 a
INNER JOIN Table2 b ON (a.id = b.Table1ID)
WHERE NOT EXISTS(
      SELECT 1 FROM Table2 c WHERE c.Table1ID = a.id AND c.BatchID > b. BatchID
)
streetpc
Problem is, "the most recent" may be different for each record, so you can't take a wholesale number for the whole table.
Adrian Godong
-1 because it was just a generic passing attempt at an answer. Also, you can see that a pure GROUP BY/WHERE wouldn't work here. You need a CTE, like what Cade's done, or a subquery.
Eric
I was adding an example. Agree that I should have been clearer by stating the subquery as the "WHERE clause that filters on the most recent".
streetpc
+8  A: 
WITH Latest AS (
    SELECT Table1ID
        ,MAX(BatchID) AS BatchID
    FROM Table2
    GROUP BY Table1ID
)
SELECT *
FROM Table1
INNER JOIN Latest
    ON Latest.Table1ID = Table1.ID
INNER JOIN Table2
    ON Table2.BatchID = Latest.BatchID
Cade Roux
Wouldn't the tag SQL-Server indicate MS SQL and void out using the Oracle 'With' clause?
madcolor
Common Table Expressions (CTEs) were introduced in SQL Server 2005, this answer is correct.
TheTXI
+2  A: 
SELECT  id, value1, value2
FROM    (
        SELECT  t1.id, t2.value1, t2.value2, ROW_NUMBER() OVER (PARTITION BY t1.id ORDER BY t2.BatchID DESC) AS rn
        FROM    table1 t1
        JOIN    table2 t2
        ON      t2.table1id = t1.id
        ) q
WHERE   rn = 1
Quassnoi
Hmm... how does the performance of subquery compared to CTE?
Adrian Godong
@Adrian: it's same
Quassnoi
You should simply run both statements in the same window with "Include Execution Plan". You will then get a % cost for each statement.
Joel Mansford
Touche! And the results are identical.
Adrian Godong
If a CTE is not used and a subquery has to be repeated identically, the query plan will still be just as good as with the CTE. The main beneift of the CTE is for more complex stacking, DRY and maintenance.
Cade Roux
And BTW @Adrian, this is a derived table not a subquery.
HLGEM
A: 

Try

select t1.*,t2.Value2
from(
select Table1ID,max(Value2) as Value2
from [Table 2]
group by Table1ID) t2
join [Table 1] t1 on t2.Table1ID = t1.id
SQLMenace