views:

127

answers:

5

This one really has me stumped. I have a documents table which hold info about the documents, and a procedures table, which is kind of like a revisions table for each document. What I need to do is write a select statement which gives me all of the documents where all of the procedures have the status "work_in_progress". Here's an example

procedures table:

document_id | status
1           | 'wip' 
1           | 'wip'
1           | 'wip'
1           | 'approved'
2           | 'wip'
2           | 'wip'
2           | 'wip'

Here, I would want my query to only return document id 2, because all of its statuses are work_in_progress. I DO NOT want document_id 1 since one of its statuses is 'approved'. I believe this is relational division I want, but I'm not sure where to start. This is MySQL 5.0 FYI.

+1  A: 

SELECT document_name FROM documents WHERE 'approved' NOT IN (SELECT status FROM procedures WHERE procedures.document_id = documents.document_id)

Jonatan Littke
This ALMOST has me there. The problem with this approach is that I don't want to always know all of the statuses. I could probably get by for now by just keeping track of all of the different status. I also have inactive, approval_pending, etc. How do I incorporate those into this query. THANKS!
Preston Marshall
So you only want the ones with only the same type of status? I.e documents that *only* have EITHER wip/approved/inactive/approval_pending etc? Or do you want to select them based on a variable?
Jonatan Littke
+2  A: 

I would do this as:

SELECT document_id, document_name, ...
FROM documents AS d
WHERE NOT EXISTS (SELECT document_id
                  FROM procedures
                  WHERE document_id = d.document_id
                      AND status != 'wip');

That is, "show me all documents where there aren't any rows in procedures with a status other than 'wip'".

So it's coming at it a little sideways from the way you described it. Instead of "all the rows are 'wip'", you verify that there aren't any rows that are not 'wip'.

Chad Birch
Posted some 'benchmarks' on working solutions http://stackoverflow.com/questions/2673053/sql-select-give-me-all-documents-where-all-of-the-documents-procedures-are-wor/2673480#2673480. Your's is on top :)
Unreason
+2  A: 

My preferred method would be this, as it uses a count to check the number that are wip is the same as the number in total

SELECT
    document_id
FROM
    documents
GROUP BY
    document_id
HAVING
    COUNT(*) =
(
    SELECT
        COUNT(*)
    FROM
        documents documents2
    WHERE
        documents2.status = 'wip'
     AND
        documents2.document_id = documents.document_id
)
Robin Day
Can't you also do something like `HAVING EVERY(status = 'wip')`?
Mark E
There's no `EVERY()` aggregate function. You could do `HAVING MIN(status) = MAX(status) = 'wip'`.
Bill Karwin
I like the MIN = MAX, very nice, although I usually like to avoid aggregates on strings. I also still prefer using the COUNT as I have though as it "reads" more like the question was asked.
Robin Day
@Mark, see http://stackoverflow.com/questions/2673053/sql-select-give-me-all-documents-where-all-of-the-documents-procedures-are-wor/2673480#2673480 - for having every you can either use also 'where value = all ( subquery )' or do mysql specific bit_and(field='value') which will aggregate the condition
Unreason
@Robin Day, I am not sure the count should be preferred in a general situation: it forces the database to actually do the counts on both sides - outside aggregate and the correlated subquery; in case of EXISTS/ALL subqueries it is possible to determine if a row is part of the result without examining all the rows of the subquery (on average; worst case scenario would look at all rows).
Unreason
A: 
SELECT document_id FROM documents WHERE document_id NOT IN 
(SELECT DISTINCT document_id FROM procedures WHERE status != 'wip')
Aurril
+1  A: 

Here's one more version of the same thing

SELECT document_id, document_name
FROM documents d 
WHERE 'wip' = ALL(SELECT status 
                  FROM procedures p 
                  WHERE p.document_id = d.document_id);

And another one

SELECT document_id, document_name
FROM documents d
     INNER JOIN procedures p ON d.document_id = p.document_id
GROUP BY document_id
HAVING bit_and(p.status='wip')

EDIT: I wondered how do the working solutions compare to each other in terms of performance, so I tested with ~300k records in procedures and ~15k records in documents with minimum number of columns and indexes covering only document_id on modest hardware

  1. NOT EXISTS ~1.2s
  2. 'wip' = ALL () ~1.2s
  3. HAVING COUNT(*) ~2.5s
  4. bit_and() ~2.5s

so it seems that bit_and() does not optimize so well as it could. Also a note, adding index on (document_id, status) half the times presented above.

Unreason