views:

554

answers:

5

I am trying to build an SQL Statement for the following search scenario:

I have trying to return all of the columns for an individual record for Table A based on the value of the status column in Table B. Each record in table A can have multiple rows in table B, making it a one to many relationship. The status column is nullable with a data type of integer.

Here are the possible values for status in table B:

  • NULL = Pending,
  • 1 = Approved,
  • 2 = Denied,
  • 6 = Forced Approval,
  • 7 = Forced Denial

The end user can search on the following scenarios:

  • Approved - All table B records must have a value of 1 or 6 for status.
  • Denied - One table B record must have a value of 2 or 5. Any other records can have 1,6, or null.
  • Pending - All table B records can have a value of 1,6 or null. One record must be null because it is not considered completed.

UPDATE
I consulted with one of our DBAs and he developed the following solution:

Approved:

SELECT a.* FROM TableA a INNER JOIN TableB ON b.id = a.id
WHERE
(b.status in (1,6) and b.status IS NOT NULL) AND
b.id NOT IN (SELECT id from TableB  WHERE status IS NULL)
AND b.id NOT IN (SELECT id from TableB WHERE status in (2,7))

Denied:

SELECT a.* FROM TableA a INNER JOIN TableB ON b.id = a.id
WHERE
(b.status in (2,7))

Pending:

SELECT a.* FROM TableA a INNER JOIN TableB ON b.id = a.id
WHERE
(b.status IN (1,6) OR b.status IS NULL)
AND b.id NOT IN (SELECT b.id FROM TableA a INNER JOIN TableB b ON b.id = a.id WHERE (b.status IN (1,6) AND b.status IS NOT NULL) AND b.id NOT IN (SELECT id from TableB WHERE status IS NULL))
AND b.id NOT IN (SELECT id FROM TableB WHERE status IN (2,7))

UPDATE 2:
@Micth Wheat - How would I refactor the following solution using the EXIST/NOT EXIST t-sql keyword?

A: 

Looks like you would want to use some exists statements.

For the ones where all values must be something then you would add a not exists against the other possible values.

--Approved

exists(select 1 from B where A.id = B.id and status in (1,6))
and not exists(select 1 from B where A.id = B.id and (status is null or status not in (1,6)))
dotjoe
Do the IN and NOT IN sub-queries work correctly when there is a NULL in the status column? I suspect not.
Jonathan Leffler
+2  A: 

As an example for 'Approved':

select 
    * 
from 
    A 
where
    (select count(*) from B where B.parent_id = A.id and B.status in (1,6)) > 0
and (select count(*) from B where B.parent_id = A.id and B.status not in (1,6)) = 0

Refactored to use exists and not exists:

select 
    * 
from 
    A 
where
    exists (select * from B where B.parent_id = A.id and B.status in (1,6)) 
and not exists (select * from B where B.parent_id = A.id and B.status not in (1,6))

If you have passed in a criteria, you can package it all up in one query like this, if it is more convenient:

select 
    * 
from 
    A 
where     
    (@Criteria = 'Approved'
and (select count(*) from B where B.parent_id = A.id and B.status in (1,6)) > 0
and (select count(*) from B where B.parent_id = A.id and B.status not in (1,6)) = 0
    )
or  (@Criteria = 'Denied'
and (select count(*) from B where B.parent_id = A.id and B.status in (2,7)) > 0
    )
or  (@Criteria = 'Pending'
and (select count(*) from B where B.parent_id = A.id and B.status not in (2,7)) = 0
and (select count(*) from B where B.parent_id = A.id and B.status is null) > 0
    )

Note, I changed the Denied example to be values of 2 and 7, rather than 2 and 5, based on your sample data.

Edit: You could also use exists and not exists, as Joe suggests.

Edit: The method using max(case ...), often also seen as sum(case ...) for counting values, does perform better in some cases (depends mostly on your volume of data whether the performance increase is noticeable - sometimes it can be a big difference). I personally find the subqueries more readable, so I start with them, and if better performance is needed, I would benchmark both methods, and if max(case ...) works better, I would switch.

Chris Teixeira
We implemented something similar with SELECT without using counts.
Michael Kniskern
Definitely use [NOT] EXISTS (SELECT 1 ...) instead of the COUNT(*)
Mitch Wheat
The approval subquery needs this line: and (select count(*) from RouteSteps B where B.ID_FK = A.id and B.status is null) = 0
Sam
A: 

Perhaps something like this

select CASE WHEN SUM(CASE WHEN ISNULL(b.status, 0) IN (1,6) THEN 1 ELSE 0 END) = COUNT(*) THEN 'Approved'
      WHEN SUM(CASE WHEN ISNULL(b.status, 0) IN (2,5) THEN 1 ELSE 0 END) > 0 THEN 'Denied'
      WHEN SUM(CASE WHEN ISNULL(b.status, 0) IN (1,0,6) THEN 1 ELSE 0 END) = COUNT(*)
       AND SUM(CASE WHEN b.status IS NULL THEN 1 else 0 END) > 0  THEN 'Pending'
      ELSE '???' END as Status, <a column list>
FROM A 
INNER JOIN b ON a.id = b.id
group by <a column list>
hova
+1  A: 

From what I read Chris Teixeira and hova use basically the same logic, but;
- Hova parses the tables only once
- Chris Teixeira parses the tables multiple times
=>Hova's technique is preferred (in my opinion)

However, Hova did it very slightly wrong...

The logic should be:

- If Any 2 or 7 records   => DENIED
- ElseIf Any NULL records => PENDING
- Else                    => ACCEPTED

This gives the following code...

SELECT
    [main].id,
    CASE WHEN MAX(CASE WHEN [status].value IN (2,7) THEN 1 ELSE 0 END) = 1 THEN 'DENIED'
         WHEN MAX(CASE WHEN [status].value IS NULL  THEN 1 ELSE 0 END) = 1 THEN 'PENDING'
         ELSE 'ACCEPTED' END
FROM
    [main]
INNER JOIN
    [status]
        ON [main].id = [status].main_id
GROUP BY
    [main].id

Also, the use of MAX rather than SUM (which Hova used) means that the query engine only has to find one match, not several. Also, it is easier for an optimiser to use appropriate indexes on the [status] table. (In the case, the index would be (main_id, value) in that order on the [status] table.

Dems.

EDIT:

Something similar could be as follows. I have no SQL Instance to test on here, so i can't tell you if it's faster, but I image it could be...

SELECT
    [main].id,
    MIN(CASE WHEN [status].value IN (2,7) THEN -1        -- Denied
             WHEN [status].value IS NULL  THEN  0        -- Pending
             ELSE                               1  END)  -- Accepted
FROM
    [main]
INNER JOIN
    [status]
        ON [main].id = [status].main_id
GROUP BY
    [main].id

EDIT:

Another option is just to join on a mapping table instead of using the CASE statements.

DECLARE @map TABLE (
   status_value   INT,
   status_result  INT
   )

INSERT INTO @map VALUES (1,    1)
INSERT INTO @map VALUES (2,   -1)
INSERT INTO @map VALUES (6,    1)
INSERT INTO @map VALUES (7,   -1)
INSERT INTO @map VALUES (NULL, 0)

SELECT
    [main].id,
    MIN([map].status_result)
FROM
    [main]
INNER JOIN
    [status]
        ON [main].id = [status].main_id
INNER JOIN
    @map AS [map]
        ON [status].value = [map].status_value
        OR ([status].value IS NULL AND [map].status_value IS NULL)
        -- # This has been faster than using ISNULLs in my experience...
GROUP BY
    [main].id
Dems
A: 

You also asked how to make use of EXISTS to solve the query. I still don't have anything to test this on, but I'd try something like the following...

Approved:

SELECT
    a.*
FROM
    TableA a
WHERE
    NOT EXISTS (SELECT * FROM TableB b WHERE b.id = a.id AND b.status IN (NULL,1,6))

Denied:

SELECT
    a.*
FROM
    TableA a
WHERE
    EXISTS (SELECT * FROM TableB b WHERE b.id = a.id AND b.status IN (1,6))

Pending:

SELECT
    a.*
FROM
    TableA a
WHERE
    EXISTS (SELECT * FROM TableB b WHERE b.id = a.id AND b.status IS NULL)
    AND NOT EXISTS (SELECT * FROM TableB b WHERE b.id = a.id AND b.status IN (1,6))
Dems