views:

7863

answers:

13

I am trying to find out what is the best (time efficient) algorithm to accomplish the task described below.

I have a set of records. For this set of records I have connection data which indicates how pairs of records from this set connect to one another (basically a graph, the records being the vertices and the connection data the edges).

All of the records in the set have connection information (i.e. no orphan records are present; each record in the set connects to one or more other records in the set).

I want to choose any two (arbitrary) records from the set and be able to show all ways possible the chosen records connect (either directly or through other connections).

For example:

    If I have the following records:
        A, B, C, D, E

    and the following represents the connections: 
        (A,B),(A,C),(B,A),(B,D),(B,E),(B,F),(C,A),(C,E),
        (C,F),(D,B),(E,C),(E,F),(F,B),(F,C),(F,E)

        [where (A,B) means record A connects to record B]

If I chose B as my starting record and E as my ending record, I would want to find all paths through the record connections that would connect record B to record E.

   All paths connecting B to E:
      B->E
      B->F->E
      B->F->C->E
      B->A->C->E
      B->A->C->F->E

This is an example, in practice I may have sets containing hundreds of thousands of records.

+1  A: 

Have you looked at the Reingold algorithm (PDF) ?

Mark Cidade
+2  A: 

Interesting.. Strictly speaking, "all possible ways" could be infinite:

B->A->B->E
B->A->B->A->B->E
B->A->B->A->B->A->B->E

etc.

I take it you need to filter out loops like these.

Blorgbeard
+1  A: 

I have solved a similar problem to this recently, instead of all solutions I was only interested in the shortest.

I used a 'breadth first' iterative search which used a queue of status' each of which held a record containing a current point on the graph and the path taken to get there.

you start with a single record in the queue, which has the starting node and an empty path.

Each iteration through the code takes the item off the head of the list, and checks to see if it is a solution (the node arrived at is the one you want, if it is, we are done), otherwise, it constructs a new queue item with the nodes connecting to the current node, and amended paths that are based on the path of the previous node, with the new jump attached at the end.

Now, you could use something similar, but when you find a solution, instead of stopping, add that solution to your 'found list' and continue.

You need to keep track of a visited nodes list, so that you never backtrack on yourself otherwise you have an infinite loop.

if you want a bit more pseudocode post a comment or something, and I will elaborate.

I believe if you're only interested in the shortest path, then Dijkstra's Algorithm is "the solution" :).
vicatcu
A: 

Here's a thought off the top of my head:

  1. Find one connection. (Depth-first search is probably a good algorithm for this, since the path length doesn't matter.)
  2. Disable the last segment.
  3. Try to find another connection from the last node before the previously disabled connection.
  4. Goto 2 until there are no more connections.
Ryan Fox
A: 

@marxidad: I've read that all, while at the same time keeping my head from exploding. It sounds very interesting and seems like what I need, but I'll have to research it more and possibly find a code example of this at work. I'm not a mathematician so formal description of the algorithm is (sadly) over my head. Unfortunately, I'm in one of those time crunch situations as well, so I may need to go with a good-enough-for-now solution. I've definitely added this to my research list. Thanks!

@Blorgbeard: yes, cyclic connections would have to be filtered out.

@Peter Morris: what you describe is sort of what I'm trying to work out on paper (pseudocode and desk checking). After which will come timing tests. If you'd like to post pseudocode that be cool. When I'm done I'll probably post mine as well, for feedback.

I was hoping someone may know the best algorithmic solution and possibly point me to a pseudocode example (or language specific example as long as it doesn't hide the implementation; as I need to write my implementation in C).

Robert
A: 

I think you should describe your real problem behind this. I say this because you ask for somthing time efficient, yet the answer set to the problem seems to grow exponentially!

Therefore I wouln't expect a better algorithm than something exponential.

I'd do backtracking and going through the whole graph. In order to avoid cycles, save all visited nodes along the way. When you go back, unmark the node.

Using recursion:

static bool[] visited;//all false Stack currentway; initialize empty

function findnodes(int nextnode) { if (nextnode==destnode) { print currentway return; } visited[nextnode]=true; Push nextnode to the end of currentway. for each node n accesible from nextnode: findnodes(n); visited[nextnode]=false; pop from currenteay }

Or is that wrong?

edit: Oh, and I forgot: You should eliminate the recursive calls by utilizing that node stack

+6  A: 

Robert,

It appears that this can be accomplished with a breadth-first search of the graph. The breadth-first search will find all non-cyclical paths between two nodes. This algorithm should be very fast and scale to large graphs (The graph data structure is sparse so it only uses as much memory as it needs to).

I noticed that the graph you specified above has only one edge that is directional (B,E). Was this a typo or is it really a directed graph? This solution works regardless. Sorry I was unable to do it in C, I'm a bit weak in that area. I expect that you will be able to translate this Java code without too much trouble though.

Graph.java

import java.util.HashMap;
import java.util.LinkedHashSet;
import java.util.LinkedList;
import java.util.Map;
import java.util.Set;

public class Graph {
    private Map<String, LinkedHashSet<String>> map = new HashMap();

    public void addEdge(String node1, String node2) {
        LinkedHashSet<String> adjacent = map.get(node1);
        if(adjacent==null) {
            adjacent = new LinkedHashSet();
            map.put(node1, adjacent);
        }
        adjacent.add(node2);
    }

    public void addTwoWayVertex(String node1, String node2) {
        addEdge(node1, node2);
        addEdge(node2, node1);
    }

    public boolean isConnected(String node1, String node2) {
        Set adjacent = map.get(node1);
        if(adjacent==null) {
            return false;
        }
        return adjacent.contains(node2);
    }

    public LinkedList<String> adjacentNodes(String last) {
        LinkedHashSet<String> adjacent = map.get(last);
        if(adjacent==null) {
            return new LinkedList();
        }
        return new LinkedList<String>(adjacent);
    }
}

Search.java

import java.util.LinkedList;

public class Search {

    private static final String START = "B";
    private static final String END = "E";

    public static void main(String[] args) {
        // this graph is directional
        Graph graph = new Graph();
        graph.addEdge("A", "B");
        graph.addEdge("A", "C");
        graph.addEdge("B", "A");
        graph.addEdge("B", "D");
        graph.addEdge("B", "E"); // this is the only one-way connection
        graph.addEdge("B", "F");
        graph.addEdge("C", "A");
        graph.addEdge("C", "E");
        graph.addEdge("C", "F");
        graph.addEdge("D", "B");
        graph.addEdge("E", "C");
        graph.addEdge("E", "F");
        graph.addEdge("F", "B");
        graph.addEdge("F", "C");
        graph.addEdge("F", "E");
        LinkedList<String> visited = new LinkedList();
        visited.add(START);
        new Search().breadthFirst(graph, visited);
    }

    private void breadthFirst(Graph graph, LinkedList<String> visited) {
        LinkedList<String> nodes = graph.adjacentNodes(visited.getLast());
        // examine adjacent nodes
        for (String node : nodes) {
            if (visited.contains(node)) {
                continue;
            }
            if (node.equals(END)) {
                visited.add(node);
                printPath(visited);
                visited.removeLast();
                break;
            }
        }
        // in breadth-first, recursion needs to come after visiting adjacent nodes
        for (String node : nodes) {
            if (visited.contains(node) || node.equals(END)) {
                continue;
            }
            visited.addLast(node);
            breadthFirst(graph, visited);
            visited.removeLast();
        }
    }

    private void printPath(LinkedList<String> visited) {
        for (String node : visited) {
            System.out.print(node);
            System.out.print(" ");
        }
        System.out.println();
    }
}

Program Output

B E 
B A C E 
B A C F E 
B F E 
B F C E
Casey Watson
Please note that this is not a breadth-first traversal. With breadth first you first visit all nodes with distance 0 to the root, then those with distance 1, then 2, etc.
mweerden
Correct, this is a DFS. A BFS would need to use a queue, enqueuing level-(N+1) nodes to be processed *after* all level-N nodes. However, for the OP's purposes, either BFS or DFS will work, as no preferred sorting order of paths is specified.
Matt J
Casey, I've been looking for a solution to this problem for ages. I have recently implemented this DFS in C++ and it works a treat.
AndyUK
+1  A: 

Here is the pseudocode I came up with. This is not any particular pseudocode dialect, but should be simple enough to follow.

Anyone want to pick this apart.

  • [p] is a list of vertices representing the current path.

  • [r] is a list of paths where meet the criteria

  • [s] is the source vertex

  • [d] is the destination vertex

  • [c] is the current vertex (argument to the PathFind routine)

Assume there is an efficient way to look up the adjacent vertices (line 6).

     1 PathList [p]
     2 ListOfPathLists [x]
     3 Vertex [s], [d]

     4 PathFind ( Vertex [c] )
     5     Add [c] to tail end of list [p]
     6     For each Vertex [v] adjacent to [c]
     7         If [v] is equal to [d] then
     8             Save list [p] in [x]
     9         Else If [v] is not in list [p]
    10             PathFind([v])
    11     Next For
    12     Remove tail from [p]
    13 Return
Robert
A: 

@Christian: Seems I was on the same train of thought. My real problem is exactly as I described, only with much larger sets. I agree this seems to grow exponentially with the size of the set. I figure I can try to make it more efficient by removing connections that are single-way from each vertex and do not lead to the destination vertex. These connections would be removed the first time they are seen. For instance in my previous example, record D could be removed since it leads to no other record. That way record D is only seen once, regardless of how many times record B happens to be part of a valid path (from start to end).

@Casey Watson: I'll take a closer look at that when my eyes are less tired. Thanks!

Robert
@Robert, this isn't a forum. Use comments to talk to someone, these posts are suppose to be "Answers".
Simucal
A: 

Another improvement would be to memoize the connections that lead to the end. For example, if I know F->E and C->F then I know C->E without having to visit every vertex adjacent to F to find the F->E connection.

Robert
A: 

As far as I can tell the solutions given by Ryan Fox (58343, Christian (58444), and yourself (58461) are about as good as it get. I do not believe that breadth-first traversal helps in this case, as you will not get all paths. For example, with edges (A,B), (A,C), (B,C), (B,D) and (C,D) you will get paths ABD and ACD, but not ABCD.

mweerden
mweerden,The breadth-first traversal that I submitted will find ALL paths while avoiding any cycles.For the graph that you have specified, the implementation correctly finds all three paths.
Casey Watson
I didn't completely read your code and assumed you used a breadth-first traversal (because you said so). However, on closer inspection after your comment, I noticed that it is in fact not. It is actually a memoryless depth-first traversal like those of Ryan, Christian and Robert.
mweerden
+5  A: 

The National Institute of Standards and Technology (NIST) online Dictionary of Algorithms and Data Structures lists this problem as "all simple paths" and recommends a depth-first search. CLRS supplies the relevant algorithms.

A clever technique using Petri Nets is found here

Michael Dorfman
A: 

The connections are called cycles, and this answer has a lot of informations for you.

elhoim