views:

67

answers:

3

The graph is unweighed, an element of the array of HashSets neighbours[] is a node neighbours[1] is node 1 (they start from 0 mind you) with its unique neighbouring nodes say 2 3 4 5. (so neighbours[5] will contain 1). And I have the following method I did with great deal of help as I dont get the algo much beyond theory. The number it returns should be the average distance between 2 nodes in the graph.

Imagine I have the following graph (node: in_links | out_links; neighbours[] does not contain the 0 loops at node 0, and no duplicates as I said.)

0: 0 0 0 | 0 0 0 1 1 1 2 3 5 6 7 7 8 8 9 9 11 
1: 0 0 0 | 2 2 3 4 4 5 6 8 
2: 0 1 1 | 3 
3: 0 1 2 | 4 9 
4: 1 1 3 | 5 12 
5: 0 1 4 | 6 7 10 
6: 0 1 5 | 10 11 12 
7: 0 0 5 | 
8: 0 0 1 | 10 
9: 0 0 3 | 12 
10: 5 6 8 | 11 
11: 0 6 10 | 
12: 4 6 9 | 

And for this trivial graph the distance that's returned is 5.781686749230769E8 ?!?! the code:

    public double getAvgDistance() {
    double total = 0;
    int[] dist = new int[n];
    ArrayList<Integer> Q = new ArrayList<Integer>();
    int tmp, index = 0, w = 0;

    for (int u=0; u<n; u++) {
        System.out.print("Avg Dist at "+u+"\r");
        // Initialise Q and dist for this iteration
        for (int v=u+1; v<n; v++) {
            Q.add(v);

            if (neighbours[u].contains(v)) {
                dist[v] = 1;
            } else {
                dist[v] = Integer.MAX_VALUE;
            }
        }

        while (!Q.isEmpty()) {

            tmp = dist[0];
            for (int e=1; e<Q.size(); e++) {
                if (dist[e] < tmp) {
                    w = Q.get(e);
                    tmp = dist[w]; // smallest dist is for this element w so far
                    index = e;
                }
            }
            Q.remove(index);

            for (int z : neighbours[w]) {
                if ( Q.contains(z)
                        && (dist[w]+1 < dist[z]) ) {

                    dist[z] = dist[w]+1;
                }
            }

        } // while end

        for (int v = u+1; v < n; v++ ) {
            total += dist[v];
        }

    } // for 0-n end

    return total /= (double)(n*(n-1)/2);
}

I don't have much experience with casting or printing real numbers so I hope its something to do with those! All comments welcome

A: 

I'm not sure if I totally understand your question, but it sounds like you MAY be having trouble with the value you are printing not being exactly what you expect. I suspect the problem may be when you are printing the value of the double. Whenever you convert a double directly to a String, I know you can get unexpected results.

This post suggests using a BigDecimal instead of a double to maintain precision: http://stackoverflow.com/questions/322749/retain-precision-with-doubles-in-java

So perhaps try doing something like the following and see if you have better results.

BigDecimal.valueOf(<your double value here>).toPlainString();
JasonStoltz
Ah excellent, this makes it at least easier to read, result for a graph of same parameters (d=3 n=13) is 412977625.2307692.Eventually I may have to write these results to file, is it better to write them as plainStrings like this thing? (later they will be read in from file and added together and an average calculated)
Recz
+1  A: 

If I understand your question correctly, nodes 7, 11 and 12 have no out links and therefore no valid paths to the other nodes.

Does your algorithm force a path by inserting a link with a cost of Integer.MAX_VALUE in these cases? If so, that would explain why you have such a very high average cost.

I also wondered whether it would be better to evaluate both forward and reverse paths. In a directed graph the cost of path AB is not necessarily the same as the cost of path BA. With your current algorithm, the cost of every path ending at node 12 is calculated, but no paths starting at node 12 are evaluated.

richj
A: 

I'm only guessing, but I see distances occasionally being set to Integer.MAX_VALUE. If those numbers are actually entering the result and then being divided down by one or two factors of 10, that would explain very nicely why the average is much, much larger than expected and roughly in the same ballpark as MAX_VALUE.

It's OK to have this large value in your graph when it is used to determine the shortest path among alternatives, but once you get to the point where you are determining actual distances, that number has to go!

Either you have a path length in the ballpark of MAX_VALUE, that says there is no path. That path length therefore doesn't go into your average. Or your path length is a small integer in the same magnitude as your in-graph distances, then it's valid and you can include it in your calculations.

The lesson to be taken home from this: Just because a number came out of a computer program that doesn't mean it's trustworthy or correct!

Carl Smotricz
yes the implementation is quite wrong, when it gets fixed with help irl i'll post the question but yes those go to the total, because im not properly updating distances (in the end i have either a distance of 1 or max_value).
Recz
Sounds like the kind of situation where I would proceed by fixing all the known bugs and then looking at what's left that's not working right.
Carl Smotricz