views:

316

answers:

1
+3  Q: 

Pagerank vs SVD

Pagerank works on the nodegraph of a series of pages and the directed edges formed by their respective inward and outward links. Thus the rank of a particular page is broadly a locally-induced effect in the nodegraph.

SVD, on the other hand, works on a whole matrix of values, and has no directionality - a link between site A and site B would only register as a 1 on the correct matrix element. It is a global system, so ranking is a global effect.

Given the extreme sparseness of web-derived matrices, I would expect SVD to be a bad performer here, as it requires a complete dataset, and has significant memory requirements.

Is that true? Does Pagerank outdo SVD largely because it is a nodegraph-based algorithm? How can Pagerank infer semantic relevance from a page beyond the number of times a word is mentioned? Or would that be a second step, performed after Pagerank has ranked the pages?

+2  A: 

There are two issues here: which measure is easy to compute, and which yields the information that we're looking for? I don't know the answer to either question, but I can perhaps give a partial answer.

Firstly, the relevance. Both quantities are centrality measures, to use a term from network theory. PageRank computes (a variant of) eigenvector centrality, while the SVD apparently leads to the Hyperlink-Induced Topics Search (HITS) algorithm. I got this from this handout from Peter Dodds (University of Vermont). They measure different things, but it's not clear to me which one is the most relevant for measuring the importance of a webpage.

Secondly, the computational cost. Mathematically speaking, the PageRank is the dominant eigenvector of the (modified) adjacency matrix - as explained on the Wikipedia page - while HITS gives the dominant singular vector of the adjacency matrix. Both are defined by the global network of webpages and links between them, and both can be computed by only considering the node graph locally. So at first sight, I think that the computational cost is roughly equal.

In conclusion, I don't know why PageRank is better than SVD; it's not even clear to me that it is better than SVD.

Jitse Niesen
Many thanks Jitse, that makes things clearer. How could you decompose whole-graph SVD into a local graph analysis though?
Phil H