ansaurus

Question

How do I translate a simple boolean statement to SQL?

Answer 1

A:

Pardon any mistakes but I think something like this would work (in T-SQL):

SELECT col1, col2, col3... FROM PERSON_T AS P, DRUG_T AS DR, DISEASE_T AS DI WHERE disease_id = 52 AND P.person_id = DI.person_id AND P.person_id = DR.person_id AND drug_id NOT IN(234, 26) AND drug_id = 474 AND disease_start_date < drug_start_date UNION SELECT col1, col2, col3... FROM PERSON_T AS P, DRUG_T AS DR, DISEASE_T AS DI WHERE disease_id = 52 AND P.person_id = DI.person_id AND P.person_id = DR.person_id AND drug_id NOT IN(791, 395) AND drug_id = 371 AND disease_start_date < drug_start_date

Now it doesn't have to be done with a UNION but for readibility I thought this was the easiest given your conditions. Maybe this will lead you in the right direction.

ajdams 2010-07-09 14:40:34

This will not handle my second pattern I just added. My patterns might specify that a person has taken drug 234 and 474 before getting the disease, but not drug 26. This query results in zero results in this case.

Jay Askren 2010-07-09 16:17:00

Answer 2

A:

SELECT per.person_id, per.name, per.gender
FROM person_t per
INNER JOIN disease_t dis
USING (person_id)
INNER JOIN drug_t drug
USING (person_id)
WHERE dis.disease_id = 52 AND drug.drug_start_date < dis.disease_start_date AND ((drug.drug_id IN (234, 474) AND drug.drug_id NOT IN (26)) OR (drug.drug_id IN (371) AND drug.drug_id NOT IN (395, 791)));

This will do what you are asking for. The IN statements at the end are pretty self explanatory.

Peter Hanneman 2010-07-09 14:53:15

This will not handle my second pattern I just added. My patterns might specify that a person has taken drug 234 and 474 before getting the disease, but not drug 26. This query results in zero results in this case.

Jay Askren 2010-07-09 16:17:16

Just had to move the 234 to the other IN clause - just put them in backwards - it works now.The first IN is the drugs you want to see before symptoms, 2nd IN are the drugs you want to see after symptoms began. 3rd and 4th IN's are your next pattern - same input format.

Peter Hanneman 2010-07-09 18:28:23

Peter Hanneman 2010-07-09 18:36:29

Answer 3

+5 A:

To me, the straightforward (if ugly) solution is to use EXISTS and NOT EXISTS clauses:

SELECT *
FROM PERSON_T INNER JOIN DISEASE_T
     USING (PERSON_ID)
WHERE DISEASE_ID = 52
  AND EXISTS (SELECT 1 FROM DRUG_T
              WHERE DRUG_T.PERSON_ID = PERSON_T.PERSON_ID
                AND DRUG_ID = 474
                AND [time condition])
  AND NOT EXISTS (SELECT 1 FROM DRUG_T
              WHERE DRUG_T.PERSON_ID = PERSON_T.PERSON_ID
                AND DRUG_ID = 234
                AND [time condition])

...and so on. In the example, we're asking for people who have taken drug 474 but not 234. Obviously, you can group the clauses with ANDs and ORs according to what you need.

Aside: I find all caps difficult to read. I usually use uppercase for SQL keywords and lowercase for table and column names.

Justin K 2010-07-09 18:54:47

I didn't think this worked at first, but it appears to give the correct answer. Thanks.

Jay Askren 2010-07-26 20:29:19

Answer 4

A:

None of the answers given seem to work. Again here is the pattern I want to implement: ( (Drug 234 = true AND Drug 474 = true AND Drug 26 = false) OR (Drug 395 = false AND Drug 791 = false AND Drug 371 = true) )

I believe the following query will work for (Drug 234 = true AND Drug 474 = true AND Drug 26 = false). Given that, it is pretty easy to add the second half of the query.

SELECT  p.person_id, p.gender FROM person_t as p 
    join drug_t as dr on dr.person_id = p.person_id 
    join disease_t as ds on ds.person_id=p.person_id 
    WHERE dr.drug_start_date < ds.disease_start_date AND disease_id = 52 AND dr.drug_id=234
INTERSECT
SELECT  p.person_id, p.gender FROM person_t as p 
    join drug_t as dr on dr.person_id = p.person_id 
    join disease_t as ds on ds.person_id=p.person_id 
    WHERE dr.drug_start_date < ds.disease_start_date AND disease_id = 52 AND dr.drug_id=474
INTERSECT (
SELECT p.person_id, p.gender
    FROM person_t as p 
    JOIN disease_t as ds on ds.person_id = p.person_id 
    LEFT JOIN drug_t as dr ON dr.person_id = p.person_id  AND dr.drug_id = 26
    WHERE disease_id = 52 AND dr.person_id is null 
UNION 
SELECT p.person_id, p.gender
    FROM person_t as p 
    JOIN disease_t as ds on ds.person_id = p.person_id 
    JOIN drug_t as dr ON dr.person_id = p.person_id  AND dr.drug_id = 26
    WHERE disease_id = 52 AND dr.drug_start_date > ds.disease_start_date)

This query works, but is quite ugly. I also suspect it will be extremely slow once I have a production database with 100 million people. Is there anything I can do to simplify/optimize this query?

Jay Askren 2010-07-23 17:56:03

Why has disease_id suddenly changed to 26 half way through? That wasn't in the question?

Martin Smith 2010-07-23 21:00:54

That was typo. It is fixed now. Thanks for catching that.

Jay Askren 2010-07-23 21:16:57

Answer 5

+1 A:

I have no idea how this will perform with large tables (I imagine it will be pretty lousy as date comparisons are typically pretty expensive), but here is a method that should work. It is relatively verbose, but is very easy to modify for different boolean cases.

Example 1:

SELECT dis.*
FROM disease_t dis
LEFT JOIN drug d1 ON d1.person_id = dis.person_id AND d1.drug_id = 234
LEFT JOIN drug d2 ON d2.person_id = dis.person_id AND d2.drug_id = 474
LEFT JOIN drug d3 ON d3.person_id = dis.person_id AND d3.drug_id = 26
LEFT JOIN drug d4 ON d4.person_id = dis.person_id AND d4.drug_id = 395
LEFT JOIN drug d5 ON d5.person_id = dis.person_id AND d5.drug_id = 791
LEFT JOIN drug d6 ON d6.person_id = dis.person_id AND d6.drug_id = 371
WHERE dis.disease_id = 52
AND (((d1.person_id IS NULL OR dis.startdate < d1.startdate) AND
      (d2.person_id IS NOT NULL AND d2.startdate < dis.startdate) AND
      (d3.person_id IS NULL OR dis.startdate < d3.startdate)) 
     OR
     ((d4.person_id IS NULL OR dis.startdate < d4.startdate) AND
      (d5.person_id IS NULL OR dis.startdate < d5.startdate) AND
      (d6.person_id IS NOT NULL AND d6.startdate < dis.startdate)))

Example 2:

SELECT dis.*
FROM disease_t dis
LEFT JOIN drug d1 ON d1.person_id = dis.person_id AND d1.drug_id = 234
LEFT JOIN drug d2 ON d2.person_id = dis.person_id AND d2.drug_id = 474
LEFT JOIN drug d3 ON d3.person_id = dis.person_id AND d3.drug_id = 26
LEFT JOIN drug d4 ON d4.person_id = dis.person_id AND d4.drug_id = 395
LEFT JOIN drug d5 ON d5.person_id = dis.person_id AND d5.drug_id = 791
LEFT JOIN drug d6 ON d6.person_id = dis.person_id AND d6.drug_id = 371
WHERE dis.disease_id = 52
AND (((d1.person_id IS NOT NULL AND d1.startdate < dis.startdate) AND
      (d2.person_id IS NOT NULL AND d2.startdate < dis.startdate) AND
      (d3.person_id IS NULL OR dis.startdate < d3.startdate)) 
     or
     ((d4.person_id IS NULL OR dis.startdate < d4.startdate) AND
      (d5.person_id IS NULL OR dis.startdate < d5.startdate) AND
      (d6.person_id IS NOT NULL AND d6.startdate < dis.startdate)))

sgriffinusa 2010-07-23 18:57:12

This appears to work. Thanks.

Jay Askren 2010-07-26 20:27:08

Answer 6

+2 A:

Here is a query that handles ( (Drug 234 = true AND Drug 474 = true AND Drug 26 = false) OR (Drug 395 = false AND Drug 791 = false AND Drug 371 = true) ), as you posted.

/*
-- AS DEFINED BY JOINS
-- All "person_id"'s match
-- Drug 1 is not Drug 2
-- Drug 1 is not Drug 3
-- Drug 2 is not Drug 3
-- All Drugs are optional as far as the SELECT statement is concerned (left join)
   -- Drug IDs will be defined in the WHERE clause
-- All Diseases for "person_id"

-- AS DEFINED IN WHERE STATEMENT
-- Disease IS 52
-- AND ONE OF THE FOLLOWING:
--   1) Disease started AFTER Drug 1
--      Disease started AFTER Drug 2
--      Drug 1 IS 234
--      Drug 2 IS 474
--      Drug 3 IS NOT 26 (AND NOT 234 or 474, as defined in JOINs)
--   2) Disease started AFTER Drug 3
--      Drug 1 IS NOT 395
--      Drug 2 IS NOT 791
--      Drug 3 IS 371
*/

SELECT p.person_id, p.gender FROM person_t as p
LEFT JOIN drug_t    AS dr1 ON (p.person_id = dr1.person_id)
LEFT JOIN drug_t    AS dr2 ON (p.person_id = dr2.person_id AND dr1.drug_id != dr2.drug_id)
LEFT JOIN drug_t    AS dr3 ON (p.person_id = dr3.person_id AND dr1.drug_id != dr3.drug_id AND dr2.drug_id != dr3.drug_id)
JOIN      disease_t AS ds  ON (p.person_id = ds.person_id)
WHERE ds.disease_id = 52
AND (   (    (dr1.drug_start_date < ds.disease_start_date AND dr2.drug_start_date < ds.disease_start_date)
        AND (dr1.drug_id = 234 AND dr2.drug_id = 474 AND dr3.drug_id != 26)
        )
    OR
        (    (dr3.drug_start_date < ds.disease_start_date)
        AND (dr1.drug_id != 395 AND dr2.drug_id != 791 AND dr3.drug_id = 371)
        )
    )

pferate 2010-07-23 19:37:53

Answer 7

A:

I don't have test data real handy to try this out on, but I think you could do something like:

SELECT *
FROM DISEASE_T D
INNER JOIN DRUG_T DR ON D.PERSON_ID = DR.PERSON_ID AND D.DRUG_ID=52
INNER JOIN PERSON_T P ON P.PERSON_ID = D.PERSON_ID
GROUP BY PERSON_ID
HAVING SUM(
    CASE WHEN DRUG_ID=234 AND DRUG_START_DATE<DISEASE_START_DATE THEN -1 
    WHEN DRUG_ID=474 AND DRUG_START_DATE<DISEASE_START_DATE THEN 1 
    WHEN DRUG_ID=26 AND DRUG_START_DATE<DISEASE_START_DATE THEN -1 
    ELSE 0 END) = 1
    OR
    SUM(
    CASE WHEN DRUG_ID=395 AND DRUG_START_DATE<DISEASE_START_DATE THEN -1 
    WHEN DRUG_ID=791 AND DRUG_START_DATE<DISEASE_START_DATE THEN -1 
    WHEN DRUG_ID=371 AND DRUG_START_DATE<DISEASE_START_DATE THEN 1 
    ELSE 0 END) = 1

The case I know will fail there is if you have multiple records for the same person and the same drug/disease in the drug/disease tables. If that's the case, you could also change the HAVING clause to look more like:

(SUM(CASE WHEN DRUG_ID=234 AND DRUG_START_DATE<DISEASE_START_DATE THEN 1 ELSE 0 END) = 0
AND SUM(CASE WHEN DRUG_ID=474 AND DRUG_START_DATE<DISEASE_START_DATE THEN 1 ELSE 0 END) > 0
AND SUM(CASE WHEN DRUG_ID=26 AND DRUG_START_DATE<DISEASE_START_DATE THEN 1 ELSE 0 END) = 0)
OR
(SUM(CASE WHEN DRUG_ID=395 AND DRUG_START_DATE<DISEASE_START_DATE THEN 1 ELSE 0 END) = 0
AND SUM(CASE WHEN DRUG_ID=791 AND DRUG_START_DATE<DISEASE_START_DATE THEN 1 ELSE 0 END) = 0
AND SUM(CASE WHEN DRUG_ID=371 AND DRUG_START_DATE<DISEASE_START_DATE THEN 1 ELSE 0 END) > 0)

joelt 2010-07-23 20:07:51

A person can take the same drug more than once and there would be a row for each time they took the drug. I believe the same is true for diseases.

Jay Askren 2010-07-23 20:18:38

Answer 8

A:

I would probably approach this problem from some direction similar to this. It's pretty flexible.

DRUG_DISEASE_CORRELATION_QUERY
===============================
DRUG_DISEASE_CORRELATION_QUERY_ID
DISEASE_ID
DESCRIPTION

(1, 52, 'What this query does.')
(2, 52, 'Add some more results.')

DRUG_DISEASE_CORRELATION_QUERY_INCLUDE_DRUG
===========================================
DRUG_DISEASE_CORRELATION_QUERY_ID
DRUG_ID

(1, 234)
(1, 474)
(2, 371)

DRUG_DISEASE_CORRELATION_QUERY_EXCLUDE_DRUG
===========================================
DRUG_DISEASE_CORRELATION_QUERY_ID
DRUG_ID

(1, 26)
(2, 395)
(2, 791)



CREATE VIEW DRUG_DISEASE_CORRELATION
AS
SELECT 
    p.*,
    q.DRUG_DISEASE_CORRELATION_QUERY_ID
FROM 
    DRUG_DISEASE_CORRELATION_QUERY q
    INNER JOIN DISEASE_T ds on ds.DISEASE_ID = q.DISEASE_ID
    INNER JOIN PERSON_T p ON p.PERSON_ID = ds.PERSON_ID
  WHERE 
    AND EXISTS (SELECT * FROM DRUG_T dr WHERE dr.PERSON_ID = p.PERSON_ID AND dr.DRUG_ID IN
        (SELECT qid.DRUG_ID FROM DRUG_DISEASE_CORRELATION_QUERY_INCLUDE_DRUG qid WHERE 
        qid.DRUG_DISEASE_CORRELATION_QUERY_ID = q.DRUG_DISEASE_CORRELATION_QUERY_ID)
        AND DRUG_START_DATE < ds.DISEASE_START_DATE)
   AND NOT EXISTS (SELECT * FROM DRUG_T dr WHERE dr.PERSON_ID = p.PERSON_ID AND dr.DRUG_ID IN
        (SELECT qed.DRUG_ID FROM DRUG_DISEASE_CORRELATION_QUERY_EXCLUDE_DRUG qed WHERE 
        qed.DRUG_DISEASE_CORRELATION_QUERY_ID = q.DRUG_DISEASE_CORRELATION_QUERY_ID)
        AND DRUG_START_DATE < ds.DISEASE_START_DATE)
GO


SELECT * FROM DRUG_DISEASE_CORRELATION WHERE DRUG_DISEASE_CORRELATION_QUERY_ID = 1
UNION
SELECT * FROM DRUG_DISEASE_CORRELATION WHERE DRUG_DISEASE_CORRELATION_QUERY_ID = 2

Jeffrey L Whitledge 2010-07-23 20:22:45

Answer 9

A:

If I have it right, you want to:

Select out those Persons
Who have been infected with one (1) specific disease
Who have been treated with one or more specified drugs
And who have NOT been been treated with one or more specified other drugs

This could be simplified by converting your "drug requirements" into a temporary table of some form. This would allow for the use of any number of "good" and "bad" drugs to query against. What I have below could be implemented as a stored procedure, but if that's not an option a number of convoluted options are available.

Breaking down the steps:

First, here's how the desired patients are selected. We'll use this as a subquery later:

SELECT [PersonData]
 from DISEASE_T di
  inner join PERSON_T pe
   on pe.Person_Id = di.Person_Id
 where di.Disease_Id = [TargetDisease]
  and [TimeConstraints]

Second, for each set of "target" drugs that you have ANDed together, set up a temporary table like so (this is SQL Server syntax, Postgres should have something similar):

CREATE TABLE #DrugSet
 (
   Drug_Id  [KeyDataType]
  ,Include  int   not null
 )

populate it with one row for every drug you are considering:

Drug_Id = the drug you're checking
Include = 1 if the person is to have taken the drug, and 0 if they are not to have taken it

and calculate two values:

@GoodDrugs, the number of drugs you want the patient to have taken
@BadDrugs, the number of drugs you want the patient to have not taken

Now, stitch all the above together in the following query:

SELECT pe.[PersonData]  --  All the desired columns from PERSON_T and elsewhere
 from DRUG_T dr
  --  Filter to only include "persons of interest"
  inner join (select [PersonData]
               from DISEASE_T di
                inner join PERSON_T pe
                 on pe.Person_Id = di.Person_Id
               where di.Disease_Id = [TargetDisease]
                and [TimeConstraints]) pe
   on pe.Person_Id = dr.Person_ID
 --  Join with any of the drugs we are intersted in
 left outer join #DrugSet ta  
  on ta.Drug_Id = dr.Drug_Id
 group by pe.[PersonData]  --  Same as in the SELECT clause
 having sum(case ta.Include
              when 1 then 1  --  This patient has been given a drug that we're looking to match
              else 0         --  This patient has not been given this drug (catches NULLs, too)
            end) = @GoodDrugs
  and  sum(case ta.Include
              when 0 then 1  --  This patient has been given this drug that we're NOT looking to match
              else 0         --  This patient has not been given this drug (catches NULLs, too)
            end) = @BadDrugs

I've intentionally ignored the time criteria as you didn't go into details on them, but they should be fairly simple to add (though I hope that's not famous last words). Further optimizations may be possible, but a lot depends on the data and other possible criteria.

You would need to run this once for each "drug set" (that is, sets of TRUE or FALSE drugs ANDed together), concatenating the list with each pass. You could probably expand #DrugSet to factor in each drug set you are checking, but I'm reluctant to try and code that without some serious data to test it against.

*/

Philip Kelley 2010-07-23 20:34:15

I disregarded the date columns and asssumed that there where intuitive primary keys present. If this is not the case, some distinct and group by clauses will be called for.

Philip Kelley 2010-07-23 20:44:09

Answer 10

A:

I've tried to break down the problem, and follow through as logically as I could.

Firstly the three tables (Person_T, Drugs_T, Disease_T) can be thought of as shown in Fig 1.0:

A Person can have multiple drugs and multiple diseases. Each drug and disease has a start date & end date.

Therfore I would first de-normalize the three tables into one table (Table_dn) thus:

dnId | PersonId | DrugId | DiseaseId | DgSt | DgEn | DiSt | DiEn
----   --------   ------   ---------   ----   ----   ----   ----

This de-normalized table can be a temp table if needed, regardless Table_dn now contains all the entire Global dataset as shown in Fig 2.0 (denoted as G).

From my understanding of your description, I can see essentially a two layer filter.

Filter 1

This filter is simply a boolean set of drugs Combinations, as you have already stated in your question description. eg:

(drug a = 1 & drug b = 0 & etc) OR (.....

Filter 2

This filter is a little more complicated than the first, it is date range criteria. Fig 3.0 shows this date range in RED. The yellow represents record dates that span in multiple ways:

before RED period
After RED period
Between RED period
Ending before end of RED period
Starting after start of RED period

Now the YELLOW date periods could be the drugs period or disease period OR combination of both.

This filter should be applied to the set of results obtained from the first results.

Of course depending on your exact question those two Filters may need to go the other way round (e.g f2 first then f1).

SQL pseudo code:

Select sub.*
From    
      (select    * 
       from      Table_dn 
       where     [Filter 1]
      ) as sub

where [Filter 2]

alt text

Darknight 2010-07-23 22:48:17

ansaurus

tags:

views:

answers:

How do I translate a simple boolean statement to SQL?

related questions