So your graph looks like this:
You can use Oracle's START WITH/CONNECT BY
feature to do what you want. If we start at node GA, we can reach all nodes in the graph, as shown below.
CREATE TABLE edges (PARENT VARCHAR(100), CHILD VARCHAR(100));
insert into edges values ('AT', 'TG');
insert into edges values ('CG', 'GT');
insert into edges values ('GA', 'AT');
insert into edges values ('GC', 'CA');
insert into edges values ('GC', 'CG');
insert into edges values ('GG', 'GC');
insert into edges values ('GT', 'TG');
insert into edges values ('TG', 'GA');
insert into edges values ('TG', 'GC');
insert into edges values ('TG', 'GG');
COMMIT;
SELECT *
FROM edges
START WITH CHILD = 'GA'
CONNECT BY NOCYCLE PRIOR CHILD = PARENT;
Output:
PARENT CHILD
1 TG GA
2 GA AT
3 AT TG
4 TG GC
5 GC CA
6 GC CG
7 CG GT
8 CG GT
9 GC CA
NOTE
Since your graph has cycles, it's important to use the NOCYCLE
syntax on the CONNECT BY
, otherwise this won't work.
EDITED ANSWER BASED ON LATEST EDITS BY OP
First of all, I assume that by "2 hops" you mean "at most 2 hops", because your current query is using level <= 2
. If you want exactly 2 hops, it should be level = 2
.
In your updated graph (image2.JPG), there is no path from AT to GT that takes 2 hops, so the query is returning what I would expect. From AT to GT, we can go AT->TG->GC->CG->GT
, but that's 4 hops, which is greater than 2, so that's why you aren't getting that result back.
If you are expecting to be able to reach AT to GT in 2 hops, then you need to add an edge between TG and GT, like this:
INSERT INTO nodes VALUES('TG','GT');
Now when you run your query, you'll get this data back:
NODE_FROM NODE_TO
AT TG
TG GC
TG GG
TG GT
Remember that START WITH/CONNECT BY
is going to only work if there is a path between the nodes. In your graph (before I added the new edge above), there is no path for AT->TG->GT
, so that's why you're not getting the result back.
Now, if you added the edge TG->AT
, then we would have the path GT->TG->AT
. So in that case AT is 2 hops away from GT (i.e. we're going the reverse way now, starting from GT and ending at AT). But to find those paths, you would need to set START WITH node_from = 'GT'.
If your goal is to find all paths from a start node to any target node that is level <= 2 hops or less away, then the above should work.
However, if you want to all find all paths from some target node back to a source node (i.e. the reverse example I gave, from GT->TG->AT
), then that's not going to work here. You'd have to run the query for all nodes in the graph.
Think of START WITH/CONNECT BY
as doing a depth first search. It's going to go everywhere it can from a starting node. But it's not going to do any more than that.
Summary:
I think the query works fine, given the constraints above. I've explained why the GT-TG
path is not returned, so I hope that makes sense.
Keep in mind, however, if you are trying to traverse reverse paths as well, you'll have to loop over every node and run the query, changing the START WITH
node each time.