views:

960

answers:

5

I have three related tables "A(id, val)", "B(id, val)", and a link table with a value "AB(aid, bid, val)"

I am querying against B to bring back A values, for example:

SELECT A.* 
FROM A INNER JOIN AB ON A.id = AB.aid INNER JOIN B ON AB.bid = B.id
WHERE B.val = 'foo';

Every A has many B's and every B has many A's.

And the catch that I'm falling apart on is the need to filter the set so that the query returns rows only when AB.val is a max for any given A/B pair

E.g. if I have the data:

A

id   val
1    something
2    somethingelse

B

id   val
1    foo
2    bar

AB

aid  bid  val
1    1    3
1    2    2
2    1    1
2    2    4

I would want to select only the first and last rows of AB since they are the max values for each of the A's and then be able to query against B.val = 'foo' to return only the first row. I don't have a clue on how I can constrain against only the max val row in the AB table.

The best I've been able to get is

SELECT * 
FROM A 
INNER JOIN 
  (SELECT aid, bid, MAX(val) AS val FROM AB GROUP BY aid) as AB
  ON A.id = AB.aid 
INNER JOIN B ON AB.id = B.id
WHERE B.val = 'foo'

but this doesn't quite work. First, it just feels to be the wrong approach, second, it returns bad bid values. That is, the bid returned from the subquery is not necessarily from the same row as the max(val). I believe this is a known group by issue where selection of values to return when the column is not specified for either collation or grouping is undefined.

I hope that some of the above makes sense, I've been banging my head against a wall for the past few hours over this and any help at all would be hugely appreciated. Thanks.

(For those wondering, the actual use of this is for a Dictionary backend where A is the Word Table and B is the Phoneme Table. AB is the WordPhoneme table with a 'position' column. The query is to find all words that end with a specified phoneme. (a phoneme is a word sound, similar in usage to the international phonetic alphabet )

+2  A: 

I think you need to do another join to get ab's max val for each a.id first.

Something like this:

select a.*
from a
left join (
    select aid, max(val) as val 
    from ab 
    group by aid
) abmax on abmax.aid=a.id
inner join ab on ab.aid=abmax.aid and ab.val=abmax.val
inner join b on b.id=ab.bid
where b.val='foo'
Blorgbeard
A: 

Here's another way that I just tested out:

select a.*
from ab
   inner join b on(ab.bid=b.id)
   inner join a on (ab.aid=a.id)
where ab.val = (select max(val) from ab AS ab2 where ab2.aid = ab.aid)
   and b.val='foo'
patmortech
+1  A: 

I often use the following trick to get the greatest per group:

SELECT a.*
FROM ab AS ab1
  LEFT OUTER JOIN ab AS ab2 ON (ab1.aid = ab2.aid AND ab1.val < ab2.val)
  JOIN a ON (ab1.aid = a.id)
  JOIN b ON (ab1.bid = b.id)
WHERE ab2.aid IS NULL
  AND b.val = 'foo';

The trick is to join to the AB table to itself in an outer join. Return ab1 where no rows exist with the same value for aid and a greater value for val. Therefore ab1 has the greatest val per group of rows with a given value for aid.

Bill Karwin
That join can get really expensive in situations when there are multiple rows in ab with the same aid. If there are just 5 rows with the same in ab, those five will produce 10 result rows before filtering. Six will result in 16 rows, and so on. Using max() is far more efficient.
SquareCog
Yes, different techniques are appropriate for different situations.
Bill Karwin
A: 

I am not sure which sql you are using but in MS SQL I create a table-valued database function to return the max values from table A then join this to table B. I find this much easier to understand than complex joins when I look back on my queries at a later stage.

MBoy
A: 
SELECT *
FROM
(
  SELECT
    A.*,
    (SELECT top 1 AB.BID FROM AB WHERE A.AID = AB.AID ORDER BY AB.val desc) as BID
  FROM A
) as Aplus
JOIN B ON Aplus.BID = B.BID
David B