views:

39

answers:

1

Hi,

I am a graduate student in computer science at Indiana University, Bloomington. For one of my research projects, i am working on calculating pageranks for a directed graph which is very sparse and has a high percentage of deadlinks.

By deadlinks I mean nodes that have outdegree zero. Sometimes, in a graph with a lot of deadlinks, spider traps may occur. Anyways, the problem I am interested in is finding out pageranks in this scenario.

And I am using JUNG (Java Universal Graph Network) for calculating the pageranks.

When I use the normal procedure,

Graph<String, String> jungGraph = new DirectedSparseGraph<String, String>();
PageRank<String, String> pagerank = new PageRank<String,String>(jungGraph, 0.2);
pagerank.setMaxIterations(20);
pagerank.setTolerance(0.000001);
pagerank.evaluate();

I get more or less the same pagerank values for all the nodes, when i clearly know that shouldn't be the case. As some nodes in the graph have a large number of outgoing nodes and are strongly interconnected.

What is the suggested approach in this case. I know there is this class PageRankWithPriors. Should I first extract the network with no deadlinks, calculate pageranks for them, and then propagate their rank to the deadlinks until they converge.? In the latter case, all the nodes in the reduced network (outdegree != 0 ) will have their priors set, whereas the deadlinks wont.

Am I missing anything here? Please pass on your suggestions.

Thanks,

Deepak.

+1  A: 

I don't think PageRankWithPriors is what you want.

Which version of PageRank are you using? The class edu.uci.ics.jung.algorithms.importance.PageRank or edu.uci.ics.jung.algorithms.scoring.PageRank? The former has been deprecated in favor of the latter in Jung 2.0 Beta.

They seem to treat outdegree 0 nodes differently, which may be your issue. The former's spec says:

probability of going from node u to node v is equal to (1-alpha)[1/outdegree(u)] + alpha(1/|V|)

If u has no out-edges in the original graph then 0 is used instead of 1/outdegree(v).

That seems wrong, as it leads to a loss of probability (the total probability of leaving u by some method should equal 1, and it doesn't). The latter does it differently:

If a vertex has no outgoing edges, then the probability of taking a random jump from that vertex is (by default) effectively 1

That should conserve probability which is what you want.

Keith Randall
@Keith - I am using edu.uci.ics.jung.algorithms.scoring.PageRank . And the results are still the same. Most of the nodes have the same rank value. Does the edu.uci.ics.jung.algorithms.scoring.PageRank calculation also consider self-edges.? The graph that i am working with has some nodes that connect to themselves. Should I remove the self-edges and then proceed with the calculation?
Deepak Konidena
Also, could you pass me link to the spec?
Deepak Konidena
Google for "jung pagerank", it is the first result.
Keith Randall
I don't know about self-edges or why you are getting unexpected results. You should look at the code, it is open source. (By the way, I would expect most nodes to have the same rank value. Your claim of "most nodes have same rank value" doesn't convince me there is anything wrong.)
Keith Randall