I'm storing contacts between different elements. I want to eliminate elements of certain type and store new contacts of elements which were interconnected by the eliminated element.
Problem background
Imagine this problem. You have a water molecule which is in contact with other molecules (if the contact is a hydrogen bond, there can be 4 other molecules around my water). Like in the following picture (A, B, C, D are some other atoms and dots mean the contact).
A B
| |
H H
. .
O
/ \
H H
. .
C D
I have the information about all the dots and I need to eliminate the water in the center and create records describing contacts of A-C, A-D, A-B, B-C, B-D, and C-D.
Database structure
Currently, I have the following structure in the database:
Table atoms
:
"id" integer PRIMARY KEY,
"amino" char(3) NOT NULL,
(HOH for water or other value)- other columns identifying the atom
Table contacts
:
"acceptor_id" integer NOT NULL,
(the atom near to my hydrogen, here C or D)"donor_id" integer NOT NULL,
(here A or B)"directness" char(1) NOT NULL,
(this should be D for direct and W for water-mediated)- other columns about the contact, such as the distance
EDIT: How would look the data in the case depicted earlier.
atoms
:
id|atom|amino
1 | O | HOH
2 | N | ARG <- atom A from image
3 | S | CYS <- B
4 | O | SER <- C
5 | N | ARG <- D
contacts
:
donor_id|acceptor_id|directness
1 4 D
1 5 D
2 1 D
3 1 D
From which I need to make
contacts
:
donor_id|acceptor_id|directness
3 4 W <- B-C
3 5 W <- B-D
2 4 W <- A-C
2 5 W <- A-D
2 3 X <- A-B (These last two rows are escaping me,
4 5 X <- C-D there could be also row D-C, but not
both C-D and D-C. A char 'X' could
be used to mark "no donor/acceptor")
Current solution (insufficient)
Now, I'm going through all the contacts which have donor.amino = "HOH"
. In this sample case, this would select contacts from C and D. For each of these selected contacts, I look up contacts having the same acceptor_id
as is the donor_id
in the currently selected contact. From this information, I create the new contact. At the end, I delete all contacts to or from HOH.
This way, I am obviously unable to create C-D and A-B contacts (the other 4 are OK).
If I try a similar approach - trying to find two contacts having the same donor_id, I end up with duplicate contacts (C-D and D-C).
Is there a simple way to retrieve all six contacts without duplicates?
I'm dreaming about some one page long SQL query which retrieves just these six wanted rows. :-)
However, any other ideas are welcome.
It is preferable to conserve information about who is donor (where possible), but not strictly necessary.
Big thanks to all of you who read this question to this point.