tags:

views:

89

answers:

1

Before the Floyd–Warshall/Dijkstra replies flood comes in please let me explain the situation as i'm sure either algorithm can be tuned for this case, and it has to be as this is not a toy example program (mind you, in java so have to keep it manageable memory-wise)

What i have is a web graph generated from node 0 to node n, node 3 cannot link to node 5, because node 5 didnt exist when node 3 was choosing it's out links. Every "node" is represented as in_neighbours[nodeID] and out_neighbours[nodeID] say nodeId=3, so we're talking about node 3. Note also that in_/out_ are both sorted, (in_ is naturally sorted as 5 will have chosen its out links all at once, only then 6 will choose out_links so 3's in_'s can never contain {6, 5, 7}) and ofc both can contain duplicates. (in/out are ArrayList arrays of size n, where out_ is always of size d or m, which along with n is specified at startup by the user)

No weights. What i must do is find the averageDistance()

public double getAvgDistance() {
    int sum = 0;        

    for (int i=1; i<n; i++) {
        for (int j=0; j < i; j++) {
            sum += dist(i, j);             // there are duplicates, make sure i skip 
        }
    }

    return (double)sum / (double)(  ((n*(n-1)) / 2)  );
}

What I have so far is the best case. Note i want only to find the distance between j & i, not all distances at the same time (not enough memory, it will be tested at m=20 d=1 000 000)

private int dist(int i, int j) {
    int dist = 0;

    for (int link : in_neighbours[j]) {
        System.out.print("\nIs "+j+" linked to by "+i);
        if (out_neighbours[i].contains(link)) {
            System.out.print(" - yes!");
            dist = 1;
        }
    }

    return dist;
}

So im asking if the "fresher" (ofc at this point the graph is completed) node i is linking to any of its older buddies directly if so, distance is 1 hop.

Is it just me or the 'shortest' path will always be the first found path if nodes are traversed backwards?

How do i check if its not 1, the "else" after the base case? My math is fairly weak please be gentle :) Any hints how to make use of the fact that the links are sorted?

It's not homework or something that im trying to cheat around from, it's not about the code itself, this has to be a useful tool, the "learning" comes by itself along the way.

here's how a graph looks nodeID, out links, in links for m=7 n=13, (note the 0 cycles is just how the graph is initialized):

0 | 0 0 0 0 0 0 0  | 0 0 0 0 0 0 0 1 1 1 1 1 1 1 2 2 2 2 2 3 4 5 6 6 7 8 9 
1 | 0 0 0 0 0 0 0  | 2 2 3 4 5 5 8 12 
2 | 0 0 0 0 0 1 1  | 3 3 3 3 3 4 4 4 6 7 8 10 
3 | 0 1 2 2 2 2 2  | 4 4 5 5 6 6 7 11 
4 | 0 1 2 2 2 3 3  | 5 5 6 8 9 10 
5 | 0 1 1 3 3 4 4  | 6 7 8 9 9 11 12 
6 | 0 0 2 3 3 4 5  | 7 7 7 8 9 9 12 
7 | 0 2 3 5 6 6 6  | 8 9 10 11 11 12 
8 | 0 1 2 4 5 6 7  | 10 10 10 11 12 
9 | 0 4 5 5 6 6 7  | 10 11 11 
10 | 2 4 7 8 8 8 9  | 12 12 
11 | 3 5 7 7 8 9 9  | 
12 | 1 5 6 7 8 10 10  | 

Sorry for the agonising long read. EDIT: Wrong code in the methods, this is what i think is correct now.

Revision of dist nr2, just try and find if theres a path at all:

private int dist(int i, int j) {
    int dist = 0, c = 0, count = 0;
    boolean linkExists = false;

    for (int link : in_neighbours[j]) {

        //System.out.print("\nIs "+j+" linked to by "+i);
        if (out_neighbours[i].contains(link)) {

            //System.out.print(" - yes!");
            dist = 1; // there is a direct link

        } else {

            while ( c < j ) {
                // if there's a path from 0 up to j, check if 'i' links to a node which eventually links to 'j'
                if (out_neighbours[i].contains(c) && 
                        (out_neighbours[c].contains(link) || in_neighbours[c].contains(link) )) {

                    count++; // yes. and this is one node we had to step through to get closer
                    linkExists = true;
                } else {
                    linkExists = false; // unreachable, the path was interrupted somewhere on the way
                    break;
                }
                c++; 
            }

            if (linkExists) {
                dist = count-1; // as 2 nodes are linked with 1 edge
            } else {
                dist = 0; // no path was found
            }
        }


    }

    return dist;
}
A: 

Since all edges have the same weight in your model, you can use a BFS search to find the shortest path from S to T.

This is an iterative process, starting with set #0, containing only the source node ({S}). At each step i, you create set #i by finding all nodes achievable from set (i-1) in one step.

The iteration terminates in two cases:

1) When you detect that set #k contains T. In this case you return k-1.

2) When the set is empty, meaning that the two nodes are unreachable.

The memory consumption is about twice the number of nodes, since at each step i you are working with two sets (i-1 and i), bounded by the total number of nodes.

--EDIT--

Here is a possible implementation (I made some tests on it):

private Integer getDist(int i, int j) {
    Set<Integer> currentSet = new HashSet<Integer>();
    currentSet.add(i);
    int dist = 0;
    while (true) {
        Set<Integer> nextSet = new HashSet<Integer>();
        for (Integer currNode : currentSet)
            nextSet.addAll(out[currNode]);

        if (nextSet.isEmpty())
            return null; //i.e. infinite
        if (nextSet.contains(j))
            return dist;
        dist++;
        currentSet = nextSet; 
    }
}

The implementation assumes that in and out are defined as List<Integer>[], and nodes are identified by numbers starting from 0. The minimal distance is counted as the number of intermediate nodes in the path, and not as the number of edges.

The duplicates you have in the lists do not break anything here, but they are irrelevant for the algorithm.

Eyal Schneider
Useful indeed, I was thinking for something like this, so far im at this though it is still incorrect, instead of creating this second set can't i just stop as soon as I find the relevant link of interest (maybe that's what you meant) adding code to my question above.
Dima
@Dima: I edited my response, and added my code to make it clearer. Yes, the loop stops as soon as the target node is found.
Eyal Schneider
I think i see what you're doing there. But most of the times it goes into an infinite loop and if it doesnt it returns a negative value for the averagemy out[] (out_neighbors[] i assume) is indeed an array of ArrayLists so out[currNode] is an ArrayList with the outgoing links of node currNode.
Dima
ok forget about the negative values, but most often loops for ever even on trivial small graphs.
Dima
@Dima: I don't see how this can happen, unless your structures contain loops, which are illegal. Can you please provide the smallest input that causes an infinite loop?
Eyal Schneider
Very true,theres no cycles when direction is taken into account but when its not like here, there are... my bad. In the end I got help on Dijkstra, but definitely went for the 2 collections like yours i sometimes fail to put it into code properly on my own.
Dima