ansaurus

Question

Matching only one specific row in a JOIN where many exist

Answer 1

A:

Perhaps use a subquery in place of the second table:

SELECT ... FROM service s
INNER JOIN (SELECT ..., Max(EndDate) FROM
   provider lu WHERE lu.ProvID = s.ProvID GROUP BY ...) p ON p.ProvID = s.ProvID

This is assuming you'd get NULL back if there is no max enddate.

R. Bemrose 2009-09-24 15:51:16

Answer 2

A:

What you are referring to is a Type-2 Dimension of a Data Warehouse.

You have to join by the ID and by the StartDate and EndDate to get the proper data.

OTTOMH code

SELECT TransactionId, TransactionType
FROM TransactionList Tx
    INNER JOIN TransactionType TxType
     ON Tx.TransactionTypeId = TxType.TxTypeId
     AND Tx.TransactionDate Between TxType.StartDate and TxType.EndDate

Raj More 2009-09-24 15:53:08

As I mentioned, I can't do that because I'm using an aggregate function and a GROUP BY. The transaction date isn't available because of the aggregation.

Ken White 2009-09-24 16:28:51

Apologies. I had obviously glossed over that part.

Raj More 2009-09-24 16:58:45

No problem. Thanks for trying, Raj.

Ken White 2009-09-24 17:06:54

Answer 3

+2 A:

in the 2nd condition, you have to get the max only if there is no NULL EndDate

SELECT ... FROM service s
INNER JOIN provider p ON p.ProvID = s.ProvID
AND (   p.EndDate IS NULL 
     or (p.EndDate = (SELECT Max(EndDate) 
                        FROM provider lu 
                       WHERE lu.ProvID = s.ProvID)
         AND NOT EXISTS (SELECT NULL 
                           FROM provider lu 
                          WHERE lu.ProvID = s.ProvID 
                            AND lu.EndDate IS NULL)
        )
    )

najmeddine 2009-09-24 15:53:13

@najmeddine: I accepted Kip's answer because it was **slightly** quicker than yours (because his used two COALESCE() function calls instead of adding another subselect for the NOT EXISTS test). Yours worked well also, though, so I'm upvoting it as well. Thanks!

Ken White 2009-09-24 16:59:34

Answer 4

+1 A:

So I guess if there is a row with NULL end date, you want that one, otherwise you want the one with the largest end date?

I'm not sure about ADS, but the following would work on SQL Server:

SELECT ... FROM service s
INNER JOIN provider p ON p.ProvID = s.ProvID
AND (COALESCE(p.EndDate, '2037-01-01') = (
   SELECT Max(COALESCE(EndDate, '2037-01-01')) FROM
   provider lu WHERE lu.ProvID = s.ProvID)
)

The COALESCE operator returns the first non-null parameter, so this is basically just setting the nulls to a time far in the future, so that SELECT MAX will give you the one with the NULL end date if there is one.

Kip 2009-09-24 15:55:17

I'm accepting this answer because it's pretty slick, and it's slightly faster than najmeddine's NOT EXISTS answer (but only slightly because of good index selection). Thanks, Kip. :-)

Ken White 2009-09-24 16:58:08

Just merged into my codebase and tested against actual data. Works 100%, with no notable performance loss. Thanks again, Kip! NOTE FOR OTHER ADS USERS: The COALESCE() calls require one addition: Change both of them to COALESCE(EndDate, CAST('2037-01-01' AS SQL_DATE)), as ADS doesn't automatically do the conversion from the date literals like other databases do.

Ken White 2009-09-24 17:50:47

@Ken White: If you have more than one row with NULL end date, you'll get more than one row as result. I suppose that would be considered corrupt data in your schema though.

Kip 2009-09-24 18:04:37

Answer 5

A:

What in your provider table denotes the current date? EndDate=NULL, EndDate=Max(EndDate) or EndDate='9999-01-01'? All three are valid choices, but this should really be unambiguous, since if it's not you're going to end up with duplicate rows in queries all the time, no matter how cleverly you craft this particular query. So i suggest fixing that in the provider table, and then something like this should work:

select p.name, p.address, p.id, sum(s.amount)
  from provider p
  join service s on p.id=s.provider_id
where p.endDate is NULL
group by p.name, p.address, p.id

wallenborn 2009-09-24 16:03:00

Can't fix the provider table. It's historical as well as current data, and the auditors won't allow us to change that for whatever reason. If restructuring the data were an option, I wouldn't have needed to post here. Thanks, though.

Ken White 2009-09-24 16:27:44

Thinking a little more about it, you _do_ have a unambiguous condition. You want to ship to the _current_ address, right? Then you don't need the s.ServiceDate. Just use "where p.EndDate is null or (sysdate between p.StartDate and p.EndDate)". Does this help?

wallenborn 2009-09-24 16:39:03

Unfortunately, no. See my edit to the original post about the terminated provider with outstanding services prior to the term date, but being paid after the term date. SysDate wouldn't fall in between start and end, nor would there be a NULL end date row at the time payment was processed.

Ken White 2009-09-24 16:49:27

ansaurus

tags:

views:

answers:

Matching only one specific row in a JOIN where many exist

related questions