views:

3095

answers:

3

I just tried to do a correlated subquery in the FROM clause of a SELECT statement in Oracle, but I was given an error indicating that I couldn't do the correlation (something to the effect that Obs.pID was not recognized). Should this work?

FROM ml.Person Person 
 JOIN ml.Obs ON Person.pID = Obs.pId
  JOIN (SELECT ObsMax2.pId, ObsMax2.hdId
    , MAX(ObsMax2.obsDate) as maxDate
    FROM ml.Obs ObsMax2
    WHERE ObsMax2.pId = Obs.pId
     AND ObsMax2.obsDate < {?EndDate}
    GROUP BY ObsMax2.pId, ObsMax2.hdId) ObsMax 
   ON Obs.pId = ObsMax.pId
    AND Obs.hdId = ObsMax.hdId
    AND Obs.obsDate = ObsMax.maxDate

My workaround would appear to be to make it a non-correlated subquery, and add criteria to the subquery that keeps it from running completely amuck, amuck, amu--oof Sorry. I'd rather figure out how to properly correlate it, though, if possible -- the view that works like that subquery takes forever to build. Thanks.

+2  A: 

Sub-queries within a FROM clause cannot refer to other tables from the same FROM clause. Removing the ObsMax2.pId = Obs.pId clause should resolve the problem and from I can tell will give you exactly the same result since the same clause is in the join condition. However, as you mention, you may run into performance issues with having the GROUP BY in the sub-query.

From what I can tell, you're trying to get the individual pID/hdId records from ml.Obs with the largest obsDate that's less than {EndDate}. In that case, what about moving the sub-query into the WHERE clause where you can correlate it? E.g.:

select ...
from
  ml.Person Person
  join ml.Obs on Person.PID = Obs.pId
where Obs.obsDate = (
    select max(obsDate)
    from ml.Obs Obs2
    where Obs2.pId = Obs.pId
      and obs2.hdId = Obs.hdId
      and Obs2.obsDate < {EndDate})
Shawn Loewen
If this is the case, and especially if the field is merely of type date, be careful to expect that more than one person may be returned (because of duplicate dates).
Alkini
Yes, if the (pId, hdId, obsDate) combination is not unique, then you could get multiple records per (pId, hdId) pair. I believe the original query would have the same issue though.
Shawn Loewen
This is a query for an EMR (medical records) system. pId is PersonId, hdId identifies an medical observation (BP, Pulse, LDL, etc.) Therefore, you wouldn't have to worry (much) about duplicating pid/hdid/date - you have one value per document.
SarekOfVulcan
But I don't think this would work - you might need observations from two different days (BMI percentile at one visit, asthma monitoring on another). Unless there's something about the correlation that I'm missing, which is always a possibility...
SarekOfVulcan
The correlation is done on all three columns. The base query would retrieve all the rows from ml.Obs for a person. The subquery then selects the rows where that obsdate is a max() for a given pId,hdId. See David Aldridge's answer for a variation on this (performance may differ).
Shawn Loewen
I just tried asking for the Explain Plan on this one, and it was an order of magnitude or two higher than the version with the non-correlated subquery. Still need to check David's version, since I don't understand that one at all. :-)
SarekOfVulcan
A: 

You've prefixed many of your tables with "ml." but not everywhere (the first join, for example). Assuming you need that (for user/permissions/whatever):

JOIN ml.Obs ON Person.pID = ml.Obs.pId

or

JOIN ml.Obs Obs ON Person.pID = Obs.pId

There are other places where this would be needed too.

If this isn't the case, remove them from your query because they're irrelevant and distracting.

Alkini
+4  A: 

You can achieve the intent of this part of the query by using an analytic function to identify the maximum obsDate for each pid and hdid.

It would be something like:

select ...
from   (
       SELECT pId,
              hdId,
              obsDate
              MAX(obsDate) over (partition by pId, hdId) maxDate
       FROM   ml.Obs
       WHERE  obsDate < {?EndDate}
       )
where  obsDate = maxDate
/
David Aldridge
The explain plan for this version was a bit better than the one I came up with, and it feels faster, too. I hadn't run into these before -- thanks!
SarekOfVulcan
no problemos. Analytic functions are very cool.
David Aldridge
You just showed me how to use PARTITION BY, and solved my problem.
Erik Olson