views:

1698

answers:

6

Sorry for a long question and not a very descriptive title, but my problem is very difficult to explain briefly.

I have three database tables:

TABLE A:  
AID PK  
STATUS VARCHAR

TABLE B:  
BID PK  
AID FK  
CID FK

TABLE C:  
CID PK  
CREATIONTIME DATE

For each STATUS = 'OK' row in table A I want to find the corresponding row in C which has the latest creation time.

First I can to fetch all rows from table A where STATUS = 'OK'.
Next I can to fetch all corresponding rows from table B.
But how to continue from there?

For example:

select AID, CID from B where AID in (select AID from A where STATUS = 'OK')

could return something like:

AID, CID  
1    1  
2    2  
2    3  
3    4  
4    5  
4    6

Let's say that CID 2 has later creation time than CID 3 and CID 6 is newer than CID 5. This means that the correct result would be rows 1, 2, 4 and 6 in table C.

Is there a way to express this with a query?

EDIT: Sorry that I wasn't specific enough. What I want to get is the CIDs from table C.

EDIT: I counted returned rows with the different solutions. Results were very interesting - and diversified:
HAINSTECH: 298 473 rows
JMUCCHIELLO: 298 473 rows
RUSS CAM: 290 121 rows
CHRIS: 344 093 rows
TYRANNOSAURS: 290 119 rows

I have not yet had the time to analyse returned rows in depth, but I'd really appreciate views on which of the queries are "broken" and why.

+3  A: 

Something like this, if I've understood you correctly

SELECT
    MAX(CREATIONTIME),
    A.AID
FROM
    A
INNER JOIN
    B
    ON 
    A.AID = B.AID
INNER JOIN
    C
    ON 
    B.CID = C.CID
WHERE
    A.STATUS = 'OK'
GROUP BY
    A.AID

EDIT:

I have now checked the following in SQL Server (I would epxect the same outcome in Oracle) and it returns the CID for the C record with the Maximum CREATIONTIME where the STATUS for the related record in A id 'OK'.

SELECT C.CID
FROM 
C C
INNER JOIN
B B
ON 
C.CID = B.CID
INNER JOIN
(
    SELECT
        MAX(C.CREATIONTIME) CREATIONTIME,
        A.AID
    FROM
        A A
    INNER JOIN
        B B
        ON 
        A.AID = B.AID
    INNER JOIN
        C C
        ON 
        B.CID = C.CID
    WHERE
        A.STATUS = 'OK'
    GROUP BY
        A.AID
) ABC
ON B.AID = ABC.AID
AND C.CREATIONTIME = ABC.CREATIONTIME

Demonstrated with the following T-SQL

DECLARE @A TABLE(AID INT IDENTITY(1,1), STATUS VARCHAR(10))
DECLARE @B TABLE(BID INT IDENTITY(1,1), AID INT, CID INT)
DECLARE @C TABLE(CID INT IDENTITY(1,1), CREATIONTIME DATETIME)

INSERT INTO @A VALUES ('OK')
INSERT INTO @A VALUES ('OK')
INSERT INTO @A VALUES ('NOT OK')
INSERT INTO @A VALUES ('OK')
INSERT INTO @A VALUES ('NOT OK')

INSERT INTO @C VALUES ('10 MAR 2008')
INSERT INTO @C VALUES ('13 MAR 2008')
INSERT INTO @C VALUES ('15 MAR 2008')
INSERT INTO @C VALUES ('17 MAR 2008')
INSERT INTO @C VALUES ('21 MAR 2008')

INSERT INTO @B VALUES (1,1)
INSERT INTO @B VALUES (1,2)
INSERT INTO @B VALUES (1,3)
INSERT INTO @B VALUES (2,2)
INSERT INTO @B VALUES (2,3)
INSERT INTO @B VALUES (2,4)
INSERT INTO @B VALUES (3,3)
INSERT INTO @B VALUES (3,4)
INSERT INTO @B VALUES (3,5)
INSERT INTO @B VALUES (4,5)
INSERT INTO @B VALUES (4,1)
INSERT INTO @B VALUES (4,2)


SELECT C.CID
FROM 
@C C
INNER JOIN
@B B
ON 
C.CID = B.CID
INNER JOIN
(
SELECT
    MAX(C.CREATIONTIME) CREATIONTIME,
    A.AID
FROM
    @A A
INNER JOIN
    @B B
    ON 
    A.AID = B.AID
INNER JOIN
    @C C
    ON 
    B.CID = C.CID
WHERE
    A.STATUS = 'OK'
GROUP BY
    A.AID
) ABC
ON B.AID = ABC.AID
AND C.CREATIONTIME = ABC.CREATIONTIME

Results in the following

CID
-----------
3
4
5

EDIT 2:

In response to your comment about each of the statements giving different results, I have ran some of the different answers here through SQL Server 2005 using my test data above (I appreciate you are using Oracle). Here are the results

--Expected results for CIDs would be

--CID
-----------
--3
--4
--5

--As indicated in the comments next to the insert statements

DECLARE @A TABLE(AID INT IDENTITY(1,1), STATUS VARCHAR(10))
DECLARE @B TABLE(BID INT IDENTITY(1,1), AID INT, CID INT)
DECLARE @C TABLE(CID INT IDENTITY(1,1), CREATIONTIME DATETIME)

INSERT INTO @A VALUES ('OK') -- AID 1
INSERT INTO @A VALUES ('OK') -- AID 2
INSERT INTO @A VALUES ('NOT OK')
INSERT INTO @A VALUES ('OK') -- AID 4
INSERT INTO @A VALUES ('NOT OK')

INSERT INTO @C VALUES ('10 MAR 2008')
INSERT INTO @C VALUES ('13 MAR 2008')
INSERT INTO @C VALUES ('15 MAR 2008')
INSERT INTO @C VALUES ('17 MAR 2008')
INSERT INTO @C VALUES ('21 MAR 2008')

INSERT INTO @B VALUES (1,1)
INSERT INTO @B VALUES (1,2)
INSERT INTO @B VALUES (1,3) -- Will be CID 3 For AID 1
INSERT INTO @B VALUES (2,2)
INSERT INTO @B VALUES (2,3)
INSERT INTO @B VALUES (2,4) -- Will be CID 4 For AID 2
INSERT INTO @B VALUES (3,3)
INSERT INTO @B VALUES (3,4)
INSERT INTO @B VALUES (3,5)
INSERT INTO @B VALUES (4,5) -- Will be CID 5 FOR AID 4
INSERT INTO @B VALUES (4,1)
INSERT INTO @B VALUES (4,2)

-- Russ Cam
SELECT C.CID, ABC.CREATIONTIME
FROM 
@C C
INNER JOIN
@B B
ON 
C.CID = B.CID
INNER JOIN
(
SELECT
    MAX(C.CREATIONTIME) CREATIONTIME,
    A.AID
FROM
    @A A
INNER JOIN
    @B B
    ON 
    A.AID = B.AID
INNER JOIN
    @C C
    ON 
    B.CID = C.CID
WHERE
    A.STATUS = 'OK'
GROUP BY
    A.AID
) ABC
ON B.AID = ABC.AID
AND C.CREATIONTIME = ABC.CREATIONTIME

-- Tyrannosaurs
select   A.AID,  
         max(AggC.CREATIONTIME)  
from    @A A,  
         @B B,  
         (  select  C.CID,  
             max(C.CREATIONTIME) CREATIONTIME  
      from @C C  
      group by CID
       ) AggC  
where    A.AID = B.AID  
and    B.CID = AggC.CID  
and    A.Status = 'OK'  
group by A.AID

-- jmucchiello
SELECT c.cid, max(c.creationtime)
FROM @B b, @C c
WHERE b.cid = c.cid
 AND b.aid IN (SELECT a.aid FROM @A a WHERE status = 'OK')
GROUP BY c.cid

-- hainstech
SELECT agg.aid, agg.cid
FROM (
    SELECT a.aid
        ,c.cid
        ,max(c.creationtime) as maxcCreationTime
    FROM @C c INNER JOIN @B b ON b.cid = c.cid
        INNER JOIN @A a on a.aid = b.aid
    WHERE a.status = 'OK'
    GROUP BY a.aid, c.cid
) as agg

--chris
SELECT A.AID, C.CID, C.CREATIONTIME
FROM @A A, @B B, @C C
WHERE A.STATUS = 'OK'
AND A.AID = B.AID
AND B.CID = C.CID
AND C.CREATIONTIME = 
(SELECT MAX(C2.CREATIONTIME) 
FROM @C C2, @B B2 
WHERE B2.AID = A.AID
AND C2.CID = B2.CID);

the results are as follows

--Russ Cam - Correct CIDs (I have added in the CREATIONTIME for reference)
CID         CREATIONTIME
----------- -----------------------
3           2008-03-15 00:00:00.000
4           2008-03-17 00:00:00.000
5           2008-03-21 00:00:00.000

--Tyrannosaurs - No CIDs in the resultset
AID         
----------- -----------------------
1           2008-03-15 00:00:00.000
2           2008-03-17 00:00:00.000
4           2008-03-21 00:00:00.000


--jmucchiello - Incorrect CIDs in the resultset
cid         
----------- -----------------------
1           2008-03-10 00:00:00.000
2           2008-03-13 00:00:00.000
3           2008-03-15 00:00:00.000
4           2008-03-17 00:00:00.000
5           2008-03-21 00:00:00.000

--hainstech - Too many CIDs in the resultset, which CID has the MAX(CREATIONTIME) for each AID?
aid         cid
----------- -----------
1           1
1           2
1           3
2           2
2           3
2           4
4           1
4           2
4           5

--chris - Correct CIDs, it is the same SQL as mine
AID         CID         CREATIONTIME
----------- ----------- -----------------------
1           3           2008-03-15 00:00:00.000
2           4           2008-03-17 00:00:00.000
4           5           2008-03-21 00:00:00.000

I would recommend running each of the given answers against a smaller number of records, so that you can ascertain whether the resultset returned is the expected one.

Russ Cam
That's very close, but it does not return the C.CID which is the only column I'm interested in.
tputkonen
Thank you for your effort! I will try it soon, but I don't understand how the select statement in inner join work in this case? How does it take into account the indirect relationship between tables A and C? I just don't get it.
tputkonen
Russ, I really appreciate your effort! It's a national holiday today so I won't verify it thoroughly, but I studied quite closely the query and it made sense, and I was able to get it running on Oracle. Also the number of rows it returns in our relatively big database seems very reasonable.
tputkonen
Please read the comment I added to my original question.
tputkonen
Having read through the various answers I think this is correct (I redid my one and they use the same logic)...
Jon Hopkins
+1  A: 

EDIT: My previous answer was nonsense. This is now a complete rewrite

This is actually a problem which has bugged me throughout my SQL life. The solution I'm going to give you is messy as hell but it works and I'd appreciate anyone either saying "yes this is messy as hell but it's the only way to do it" or say "no, do this...".

I think the unease comes from joining two dates. The way it happens here it's not an issue as they will be an exact match (they have exactly the same root data) but it still feels wrong...

Anyway, breaking this down, you need to do this in two stages.

1) The first is to return a results set [AID], [earliest CreationTime] giving you the earliest creationtime for each AID.

2) You can then use latestCreationTime to pull the CID you want.

So for part (1), I'd personally create a view to do it just to keep things neat. It allows you to test this part and get it working before you merge it with the other stuff.

create view LatestCreationTimes
as
select b.AID,
       max(c.CreationTime) LatestCreationTime
from   TableB b,
       TableC c
where  b.CID = c.CID
group by b.AID

Note, we've not taken into account the status at this point.

You then need to join that to TableA (to get the status) and TableB and TableC (to get the CID). You need to do all the obvious links (AID, CID) and also join the LatestCreationTime column in the view to the CreationTime column in TableC. Don't also forget to join the view on AID otherwise where two records have been created at the same time for different A records you'll get issues.

select A.AID,
       C.CID
from   TableA a,
       TableB b,
       TableC c,
       LatestCreationTimes lct
where  a.AID = b.AID
and    b.CID = c.CID
and    a.AID = lct.AID
and    c.CreationTime = lct.LatestCreationTime
and    a.STATUS = 'OK'

I'm certain that works - I've tested it, tweaked data, retested it and it behaves. At least it does what I believe it's meant to do.

It doesn't however deal with the possibility of two identical CreationTimes in table C for the same record. I'm guessing that this shouldn't happen however unless you've written sometime that absolutely constrains it it needs to be accounted for.

To do this I need to make an assumption about which one you'd prefer. In this case I'm going to say that if there are two CIDs which match, you'd rather have the higher one (it's most likely more up to date).

select A.AID,
       max(C.CID) CID
from   TableA a,
       TableB b,
       TableC c,
       LatestCreationTimes lct
where  a.AID = b.AID
and    b.CID = c.CID
and    c.CreationTime = lct.LatestCreationTime
and    a.STATUS = 'OK'
group by A.AID

And that, I believe should work for you. If you want it as one query rather than with the view then:

select A.AID,
       max(C.CID) CID
from   TableA a,
       TableB b,
       TableC c,
       (select b.AID,
               max(c.CreationTime) LatestCreationTime
        from   TableB b,
               TableC c
        where  b.CID = c.CID
        group by b.AID) lct
where  a.AID = b.AID
and    b.CID = c.CID
and    c.CreationTime = lct.LatestCreationTime
and    a.STATUS = 'OK'
group by A.AID

(I've just embedded the view in the query, otherwise the principal is exactly the same).

Jon Hopkins
Sorry that the question was not exact at first on what I am looking for. I need to get CIDs.
tputkonen
Not a problem, you can just add C.CID to the select list and it will still work.
Jon Hopkins
Please read the comment I added to my original question.
tputkonen
Completely rewritten my answer... Go have another look.
Jon Hopkins
I'm really astonished by the effort from you guys. I'd really like to pick two accepted answers, however system only allows one and Russ was the first one. Really appreciate your effort and good explanations.
tputkonen
+1  A: 
SQL> create table a (aid,status)
  2  as
  3  select 1, 'OK' from dual union all
  4  select 2, 'OK' from dual union all
  5  select 3, 'OK' from dual union all
  6  select 4, 'OK' from dual union all
  7  select 5, 'NOK' from dual
  8  /

Tabel is aangemaakt.

SQL> create table c (cid,creationtime)
  2  as
  3  select 1, sysdate - 1 from dual union all
  4  select 2, sysdate - 2 from dual union all
  5  select 3, sysdate - 3 from dual union all
  6  select 4, sysdate - 4 from dual union all
  7  select 5, sysdate - 6 from dual union all
  8  select 6, sysdate - 5 from dual
  9  /

Tabel is aangemaakt.

SQL> create table b (bid,aid,cid)
  2  as
  3  select 1, 1, 1 from dual union all
  4  select 2, 2, 2 from dual union all
  5  select 3, 2, 3 from dual union all
  6  select 4, 3, 4 from dual union all
  7  select 5, 4, 5 from dual union all
  8  select 6, 4, 6 from dual union all
  9  select 7, 5, 6 from dual
 10  /

Tabel is aangemaakt.

SQL> select a.aid
  2       , max(c.cid) keep (dense_rank last order by c.creationtime) cid
  3       , max(c.creationtime) creationtime
  4    from a
  5       , b
  6       , c
  7   where b.aid = a.aid
  8     and b.cid = c.cid
  9     and a.status = 'OK'
 10   group by a.aid
 11  /

       AID        CID CREATIONTIME
---------- ---------- -------------------
         1          1 30-04-2009 09:26:00
         2          2 29-04-2009 09:26:00
         3          4 27-04-2009 09:26:00
         4          6 26-04-2009 09:26:00

4 rijen zijn geselecteerd.
Rob van Wijk
+1  A: 

Select the field you are looking for using a join of all 3 tables and then limit the results to the ones where the CREATIONDATE is the most recent.

SELECT A.AID, C.CID, C.CREATIONTIME
FROM A A, B B, C C
WHERE A.STATUS = 'OK'
AND A.AID = B.AID
AND B.CID = C.CID
AND C.CREATIONTIME = 
(SELECT MAX(C2.CREATIONTIME) 
FROM C C2, B B2 
WHERE B2.AID = A.AID
AND C2.CID = B2.CID);
Chris
Please read the comment I added to my original question.
tputkonen
A: 

Am I missing something? What is wrong with:

EDIT: Okay, I see you actually want to group by aid.

SELECT c.cid FROM b, c,
    (SELECT b.aid as aid, max(c.creationtime) as creationtime
     FROM b, c
     WHERE b.cid = c.cid
       AND b.aid IN (SELECT a.aid FROM a WHERE status = 'OK')
     GROUP BY b.aid) as z
WHERE b.cid = c.cid
  AND z.aid = b.aid
  AND z.creationtime = c.creationtime
jmucchiello
Please read the comment I added to my original question.
tputkonen
That is the most unhelpful comment ever. You could tell people what part of the original question is not being followed. It is not the same for everyone.
jmucchiello
I apologize, I thought to test it with less data tomorrow and post the results but Russ was kind enough to do that already.
tputkonen
A: 

There is no need for a subquery, the aggregation to determine the latest cid creation time is straightforward:

SELECT a.aid
    ,c.cid
    ,max(c.creationtime) as maxcCreationTime
FROM c INNER JOIN b ON b.cid = c.cid
    INNER JOIN a on a.aid = b.aid
WHERE a.status = 'OK'
GROUP BY a.aid, c.cid

If you really don't want the creationtime in your row set, you can just wrap it in a subquery and drop it from the projection:

SELECT agg.aid, agg.cid
FROM (
    SELECT a.aid
     ,c.cid
     ,max(c.creationtime) as maxcCreationTime
    FROM c INNER JOIN b ON b.cid = c.cid
     INNER JOIN a on a.aid = b.aid
    WHERE a.status = 'OK'
    GROUP BY a.aid, c.cid
) as agg

Coding in the web page, please excuse any syntax mistakes. Also, I'm an mssql guy so I hope there is nothing different in the Oracle world for this..

Note that the schema you have provided does not enforce uniqueness of CREATIONTIME per cid. If there are ever two cid values that map to a given aid value with the same creationtime, they will both be outputted. If you rely on the pair of cid,creationtime to be unique, you should enforce it declaratively with a constraint.

Please read the comment I added to my original question.
tputkonen
This won't work. Because you're grouping by AID and CID at the same time it will return multiple rows from TableC. Say you've got one row in table A (ID 1, Status OK), two in table C (ID 1 and 2, dates today and tomorrow respectively) and two in table C (linking both table C records to the one value in table A). Your query will return both table C records as today is the maximum for AID = 1 and CID = 1 and tomorrow is the maximum value for AID = 1 and CID = 2
Jon Hopkins