I have the following database table with information about people, diseases, and drugs:
PERSON_T DISEASE_T DRUG_T
========= ========== ========
PERSON_ID DISEASE_ID DRUG_ID
GENDER PERSON_ID PERSON_ID
NAME DISEASE_START_DATE DRUG_START_DATE
DISEASE_END_DATE DRUG_END_DATE
From these tables, I run some statistics about which individuals have taken which drugs and had which diseases. From this I can figure out which patterns are interesting for me to delve further into. For instance, below is a simplified example of the boolean pattern I might find for disease 52:
( (Drug 234 = false AND Drug 474 = true AND Drug 26 = false) OR
(Drug 395 = false AND Drug 791 = false AND Drug 371 = true) )
Edit: Here is another example:
( (Drug 234 = true AND Drug 474 = true AND Drug 26 = false) OR
(Drug 395 = false AND Drug 791 = false AND Drug 371 = true) )
Now I want to convert this pattern into a sql query and find all the people who match this pattern.
For example, I want to find all of the people in PERSON_T who had the disease and
((who did not take drug 234 and 26 before exhibiting symptoms, but did take drug 474 before exhibiting symptoms) or
(who took drug 371 before exhibiting symptoms, but not drug 791 and 395 before exhibiting symptoms))
How would I go about translating this pattern back into the original query?
Here's my first attempt, but I get stuck on the first term:
SELECT * FROM PERSON_T, DRUG_T, DISEASE_T
WHERE DISEASE_ID = 52 AND
PERSON_T.PERSON_ID = DISEASE_T.PERSON_ID AND
PERSON_T.PERSON_ID = DRUG_T.PERSON_ID AND
(DRUG_T.DRUG_ID=234 AND (DRUG_T.DRUG_START_DATE>DISEASE_T.END_DATE || ???)
I need this to work in PostgreSql, but I assume that any given answer can be translated from a given database to PostgreSql.
Response to comments
- I fixed formatting of database tables. Thank you.
- I need to be able to take an arbitrary boolean statement and translate it to SQL. The boolean statements we are actually creating are much longer than the example I gave. Any new tables I create will be in a new database and need to have the same schema as the original tables. This way to the end user, they can run their same code on the new tables and it work the same as if it ran on the original tables. This is a requirement from the customer. I'm hoping I can create a view which is just a query to the original tables. If we can't get that to work, I may create a copy of the tables and filter the data as I'm copying it over to the new table. We are not using neural networks to do the analysis. We are using our own custom algorithms which scale much better than neural networks.
- The Disease_Start_Date is the date when the person gets the diseaes which is likely when the symptons start appearing. Disease_End_Date is when the person is recovered which is likely when symptoms disappear.
- Drug_start_date is when the person starts taking the drugs. Drug_end_date is when the person stops taking the drugs.
Edit I added my own answer. Can anyone come up with a simpler answer?