views:

84

answers:

5

I have a table "defects" in the following format:

id   status  stat_date   line   div  area
1    Open    09/21/09    F      A    cube
1    closed  01/01/10    F      A    cube
2    Open    10/23/09    B      C    Back
3    Open    11/08/09    S      B    Front
3    closed  12/12/09    S      B    Front   

My problem is that I want to write a query that just extracts the "Open" defects. If I write a query to simply extract all open defects, then I get the wrong result because there are some defects, that have 2 records associated with it. For example, with the query that I wrote I would get defect id#s 1 and 3 in my result even though they are closed. I hope I have explained my problem well. Thank you.

A: 

This should get what you want. I wouldn't have a record for open and closing a defect, rather just a single record to track a single defect. But that may not be something you can change easily.

SELECT id FROM defects 
WHERE status = 'OPEN' AND id NOT IN 
(SELECT id FROM defects WHERE status = 'closed')
rosscj2533
This does not work if defect was subsequently re-opened.
RedFilter
Thank you. This works very well. I am amazed at how many different ways there are to get the correct results. Thank you again. Defects are not allowed to be re-opened so at least I don't have to worry about that. Thanks again!
jen
+1  A: 
Select * 
from defects d
where status = 'Open'
and not exists (
   select 1 from defects d1
   where d1.status = 'closed'
   and d1.id = d.id
   and d1.stat_date > d.stat_date
)
Steve De Caux
This does not work if defect was subsequently re-opened.
RedFilter
OK, edited to check for Open status date >closed
Steve De Caux
+1  A: 

So you want to get the most recent row per id and of those, only select those that are open. This is a variation of the common greatest-n-per-group problem.

I would do it this way:

SELECT d1.*
FROM defects d1
LEFT OUTER JOIN defects d2
  ON (d1.id = d2.id AND d1.stat_date < d2.stat_date)
WHERE d2.id IS NULL
  AND d1.status = 'Open';
Bill Karwin
Would you say this typically performs better than the group by method? My instincts tell me no, as you must do a date comparison on every record, but have not tested this.
RedFilter
This works great. Thanks a million because I was working on this for a few days and I was getting really frustrated.
jen
@OrbMan: Which solution performs better depends on the RDBMS implementation and your specific data set, so one should test both queries on real data to be sure. I would hope the RDBMS can compare two dates efficiently, especially if they're indexed.
Bill Karwin
+2  A: 

Use:

SELECT t.*
  FROM DEFECTS t
  JOIN (SELECT d.id,
               MAX(d.stat_date) 'msd'
          FROM DEFECTS d
      GROUP BY d.id) x ON x.id = t.id
                      AND x.msd = t.stat_date
 WHERE t.status != 'closed'
  1. The join is getting the most recent date for each id value.
  2. Join back to the original table on based on the id and date in order to get only the most recent rows.
  3. Filter out those rows with the closed status to know the ones that are currently open
OMG Ponies
A: 

This query handles multiple opens/closes/opens, and only does one pass through the data (i.e. no self-joins):

SELECT * FROM
(SELECT DISTINCT
        id
       ,FIRST_VALUE(status)
        OVER (PARTITION BY id
              ORDER BY stat_date desc)
        as last_status
       ,FIRST_VALUE(stat_date)
        over (PARTITION BY id
              ORDER BY stat_date desc)
        AS last_stat_date
       ,line
       ,div
       ,area
 FROM defects)
WHERE last_status = 'Open';
Jeffrey Kemp