views:

5516

answers:

8

I have two tables:

Patient

  • pkPatientId
  • FirstName
  • Surname

PatientStatus

  • pkPatientStatusId
  • fkPatientId
  • StatusCode
  • StartDate
  • EndDate

Patient -> PatientStatus is a one to many relationship.

I am wondering if its possible in SQL to do a join which returns only the first two PatientStatus records for each Patient. If only one PatientStatus record exists then this should not be returned in the results.

The normal join of my query is:

SELECT FROM Patient p INNER JOIN PatientStatus ps ON p.pkPatientId = ps.fkPatientId
ORDER BY ps.fkPatientId, ps.StartDate
+1  A: 

I did not try but this could work;

SELECT /*(your select columns here)*/, row_number() over(ORDER BY ps.fkPatientId, ps.StartDate) as rownumber FROM Patient p INNER JOIN PatientStatus ps ON p.pkPatientId = ps.fkPatientId
where rownumber between 1 and 2

if this did not work, see this link.

yapiskan
+2  A: 

EDIT: Both of the following solutions require that PatientStatus.StartDate is unique within each patient.

The traditional way (SQL Server 2000 compatible):

SELECT 
  p.pkPatientId,
  p.FirstName,
  p.Surname,
  ps.StatusCode,
  ps.StartDate,
  ps.EndDate
FROM 
  Patient p 
  INNER JOIN PatientStatus ps ON 
    p.pkPatientId = ps.fkPatientId
    AND ps.StartDate IN (
      SELECT TOP 2 StartDate 
      FROM     PatientStatus 
      WHERE    fkPatientId = ps.fkPatientId
      ORDER BY StartDate  /* DESC (to switch between first/last records) */
    )
WHERE 
  EXISTS (
    SELECT   1 
    FROM     PatientStatus
    WHERE    fkPatientId = p.pkPatientId
    GROUP BY fkPatientId
    HAVING   COUNT(*) >= 2
  )
ORDER BY 
  ps.fkPatientId, 
  ps.StartDate

A more interesting alternative (you'd have to try how well it performs in comparison):

SELECT 
  p.pkPatientId,
  p.FirstName,
  p.Surname,
  ps.StatusCode,
  ps.StartDate,
  ps.EndDate
FROM 
  Patient p 
  INNER JOIN PatientStatus ps ON p.pkPatientId = ps.fkPatientId
WHERE
  /* the "2" is the maximum number of rows returned */
  2 > (
    SELECT 
      COUNT(*)
    FROM 
      Patient p_i 
      INNER JOIN PatientStatus ps_i ON p_i.pkPatientId = ps_i.fkPatientId
    WHERE
      ps_i.fkPatientId = ps.fkPatientId
      AND ps_i.StartDate < ps.StartDate
      /* switch between "<" and ">" to get the first/last rows */
  )
  AND EXISTS (
    SELECT   1 
    FROM     PatientStatus
    WHERE    fkPatientId = p.pkPatientId
    GROUP BY fkPatientId
    HAVING   COUNT(*) >= 2
  )
ORDER BY 
  ps.fkPatientId, 
  ps.StartDate

Side note: For MySQL the latter query might be the only alternative - until LIMIT is supported in sub-queries.

EDIT: I added a condition that excludes patients with only one PatientStatus record. (Thanks for the tip, Ryan!)

Tomalak
A: 

Here is how I would approach this:

-- Patients with at least 2 status records
with PatientsWithEnoughRecords as (
    select fkPatientId
     from PatientStatus as ps
     group by 
      fkPatientId
     having
      count(*) >= 2
)
select top 2 *
    from PatientsWithEnoughRecords as er 
     left join PatientStatus as ps on
      er.fkPatientId = ps.fkPatientId
    order by StartDate asc

I am not sure what determines the "first" two status records in your case, so I assumed you want the earliest two StartDates. Modify the last order by clause to get the records that you are interested in.

Edit: SQL Server 2000 doesn't support CTEs, so this solution will indeed only work directly on 2005 and later.

Tiberiu Ana
A: 

Ugly, but this one does not rely on uniqueness of StartDate and works on SQL 2000

select * 
from Patient p 
join PatientStatus ps on p.pkPatientId=ps.fkPatientId
where pkPatientStatusId in (
 select top 2 pkPatientStatusId 
 from PatientStatus 
 where fkPatientId=ps.fkPatientId 
 order by StartDate
) and pkPatientId in (
 select fkPatientId
 from PatientStatus
 group by fkPatientId
 having count(*)>=2
)
Hafthor
This is the best solution for both 2000 and 2005; but you want to put the two sub-queries in temp tables for 2000 and a CTE for 2005+
Hogan
(If you don't then you get O(N1*N2*N2) as op. to O(N1+N2+N2) where N1 is the number of patients and N2 is the number of status records.
Hogan
+1  A: 

Adding this WHERE clause to the outer query of Tomalak's first solution will prevent Patients with less than 2 status records from being returned. You can also "and" it in the WHERE clause of the second query for the same results.

WHERE pkPatientId IN (
    SELECT pkPatientID 
    FROM Patient JOIN PatientStatus ON pkPatientId = fkPatientId
    GROUP BY pkPatientID HAVING Count(*) >= 2
)
Ryan
Thanks for the hint, I overlooked this particular requirement of the question. I made a different condition than the one you have, though.
Tomalak
Np. I figured it was just an oversight. Cheers.
Ryan
+4  A: 

Here is my attempt - It should work on SQL Server 2005 and SQL Server 2008 (Tested on SQL Server 2008) owing to the use of a common table expression:

WITH CTE AS
(
    SELECT  fkPatientId
          , StatusCode
          -- add more columns here
          , ROW_NUMBER() OVER
    (
    PARTITION BY fkPatientId ORDER BY fkPatientId desc) AS [Row_Number] 
    from PatientStatus
    where fkPatientId in
    (
     select fkPatientId
     from PatientStatus
     group by fkPatientId
     having COUNT(*) >= 2
    )
)
SELECT p.pkPatientId,
    p.FirstName,
    CTE.StatusCode  
FROM [Patient] as p
    INNER JOIN CTE
        ON p.[pkPatientId] = CTE.fkPatientId
WHERE CTE.[Row_Number] = 1 
or CTE.[Row_Number] = 2
RobS
Will SQL Server know well enough to run your subquery only once grouping by fkPatientID? Otherwise you might get better performance by putting the constraint on fkPatientID within it.
Tom H.
Hi Tom, I might give that a try actually, thanks
RobS
+1  A: 

Check if your server supports windowed functions:

SELECT * 
FROM Patient p
LEFT JOIN PatientStatus ps ON p.pkPatientId = ps.fkPatientId
QUALIFY ROW_NUMBER() OVER (PARTITION BY ps.fkPatientId ORDER BY ps.StartDate) < 3

Another possibility, which should work with SQL Server 2005:

SELECT * FROM Patient p
LEFT JOIN ( 
    SELECT *, ROW_NUMBER(PARTITION BY fsPatientId ORDER by StartDate) rn
    FROM PatientStatus) ps
ON p.pkPatientId = ps.fkPatientID 
and ps.rn < 3
+5  A: 

A CTE is probably your best bet if you're in SQL Server 2005 or greater, but if you want something a little more compatible with other platforms, this should work:

SELECT
     P.pkPatientID,
     P.FirstName,
     P.LastName,
     PS1.StatusCode AS FirstStatusCode,
     PS1.StartDate AS FirstStatusStartDate,
     PS1.EndDate AS FirstStatusEndDate,
     PS2.StatusCode AS SecondStatusCode,
     PS2.StartDate AS SecondStatusStartDate,
     PS2.EndDate AS SecondStatusEndDate
FROM
     Patient P
INNER JOIN PatientStatus PS1 ON
     PS1.fkPatientID = P.pkPatientID
INNER JOIN PatientStatus PS2 ON
     PS2.fkPatientID = P.pkPatientID AND
     PS2.StartDate > PS1.StartDate
LEFT OUTER JOIN PatientStatus PS3 ON
     PS3.fkPatientID = P.pkPatientID AND
     PS3.StartDate < PS1.StartDate
LEFT OUTER JOIN PatientStatus PS4 ON
     PS4.fkPatientID = P.pkPatientID AND
     PS4.StartDate > PS1.StartDate AND
     PS4.StartDate < PS2.StartDate
WHERE
     PS3.pkPatientStatusID IS NULL AND
     PS4.pkPatientStatusID IS NULL

It does seem a little odd to me that you would want the first two statuses instead of the last two, but I'll assume that you know what you want.

You can also use WHERE NOT EXISTS instead of the PS3 and PS4 joins if you get better performance with that.

Tom H.