views:

363

answers:

4

Say I have a list of edges each containing two nodes (to and from). What is the best way to find the edge two given nodes? Note that nodes in the edge may repeat.

Say I have edge in this format:

1 <-> 5

3 <-> 7

5 <-> 6

2<-> 6

Then query such as 1 5 will return true.

Then query such as 5 2 will return true because 5 connects 6 and 6 connects to 2.

Then query such as 1 7 will return false.

Then query such as 7 4 will return false since 4 doesnt exist, it means it is edge-less node.

+1  A: 

You are basically looking forward to test if a given pair of nodes have a path in between them or not. This is a general case of the shortest path problem. Note, however, it suffices if we can find a shortest path between the pair of nodes in question. Use whatever representation suits you (adjacency matrix, adjacency list, edge-sets, union-find ...) and go ahead with a BFS/Djikstra implementation for all pair of nodes. It is then only a matter of servicing queries. Or, you can run Djikstra/BFS on a lazy basis (and cache past computations on an incremental manner).

dirkgently
-1. Shortest paths aren't needed for this -- see Theran's solution. (I realise you may have responded to an earlier version of the question, but in any case this answer is no longer correct.)
j_random_hacker
@ j_random_hacker: I had mentioned union find as well if you will note my answer. I had only converted an ill-defined problem to a known one.
dirkgently
@ j_random_hacker: As for the shortest path -- I'd highlighted only one possible answer among many.
dirkgently
@dirkgently: You did mention it, but you mentioned it amongst a bunch of other, unnecessary approaches -- why? The problem (as it is now stated, though perhaps not before the latest edit) is already well-defined.
j_random_hacker
@ j_random_hacker: I don't think they are superfluous. They are alternative ways of approaching the problem.
dirkgently
@dirkgently: They are of course useful algorithms/data structures in their own right, but they are superfluous for *this* problem, because the disjoint union-find algo/data structure solves the problem more simply and efficiently than they do.
j_random_hacker
A: 

Check out JGraphT library, it specializes in graph algorithms and does what you need.

serg
+5  A: 

It sounds to me like you are just asking if a path exists between two vertices in an undirected graph, but not necessarily what that path might be. This is the same as asking if the two vertices are in the same connected component of the graph.

If you really do only need to know if the two vertices are in the same connected component, then there is a simple and efficient algorithm using a Disjoint-set data structure.

initialize the disjoint set structure (DSS)
for each edge:
  for each vertex in edge:
    if the vertex does not exist in the DSS:
      create a new subset in the DSS containing only the vertex
  merge the subsets of the two vertices

To determine if a path exists between two vertices after processing all the edges, just check if the two vertices are in the same subset. If they are, then some path exists between them.

With an efficient implementation of the DSS, this algorithm achieves just slightly worse than linear time, and even with a simple linked-list implementation of the DSS it's O(n*log(n)). As j_random_hacker mentions, Floyd-Warshall is O(n^3) time and O(n^2) storage no matter if you are only calculating transitive closure or not, and using Dijkstra's algorithm requires an O(n*log(n)) calculation for each query.

Theran
+1. This is the (only) correct, efficient answer for the revised version of the question.
j_random_hacker
In fact, the question doesn't imply knowing disjoint verteces sets. It is excessive. You only need a transitive closure of a graph.
dragonfly
A: 

You probably want some variation of Floyd algorithm for finding shortest paths between all graph verteces. As far as I can tell, you need only transitive closure of an undirected graph. Here is pseudocode:

for k = 1 to n
  for i = 1 to n
    for j = 1 to n
      W[i][j] = W[i][j] or (W[i][k] and W[k][j])

W before running this code should be a binary adjacency matrix for a graph (W[i][j] == 1 <-> there is an edge from i to j). After it will be a transitive closure. I.e. W[i][j] would be equal to 1 if and only if j is reachable from i.

dragonfly
-1. Shortest paths are not required -- see Theran's solution.
j_random_hacker
Please, care to read the whole post or at least the code snippet. If you will watch closely you'll find out that **this algorithm determines only whether one vertex is reachable from another**.
dragonfly
Floyd uses O(n^3) time and a lot of memory. It's clearly not the best algorithm for this problem.
Georg
Well, I agree that it may not be the best, but it is simplest.
dragonfly
@dragonfly: I see now that you are computing only transitive closure rather than shortest path, but I doubt that using "or" and "and" is much faster than using "min()" and "+" respectively. In any case the complexity is still O(n^3), whereas Theran's is near-linear (inverse Ackermann).
j_random_hacker
@dragonfly: I think your algorithm is the simplest, but the union-find algorithm is also very simple, i.e. not far behind it.
j_random_hacker
@j_random_hacker: Well, "or" and "and" are there not for speed but for clarity of the algorithm and values it operates on :)
dragonfly