[Math] When will PageRank fail

algorithmsmatrices

I'm testing out a few different kind of metrics for a reranking application. I was curious and wanted to try running the PageRank algorithm to see if it could help at all. I used a similarity metric to generate "links" between items in the list that needed reranking.

The PageRank works by simulating a random web surfer. We create a matrix that represents the links between pages and then calculate the probability of surfing to another page. This is done by iterative multiplication of the NxN transition matrix by the 1xN matrix representing the probability of moving to each of the pages. I followed the simple implementation outlined here.

My first problem was that I had a symmetric similarity metric; a completely symmetrical link matrix resulted in equal ranks for every single page. I tried using a less symmetrical metric and produced the following matrix:

  7 -26 -32 
-14   8 -14 
-20 -14   8 

The transition matrix then was:

-0.09 0.49 0.60
 0.66 -0.33 0.66 
 0.73 0.52 -0.24 

And after running the algorithm all three pages were ranked 0.39, which is useless.
My question is how to tell if a matrix will end up producing equal PageRanks. If I knew the characteristics of the required matrix, I could search for suitable metric to fill the matrix and then experiment properly with the algorithm.

Best Answer

As pointed out by Byron Schmuland in the above comments, a probability matrix should be a square matrix with (a) nonnegative entries and (b) each row sum equal to 1. Such properties have "nothing to do with the web", and negative numbers are not just "not normal", but completely nonsensical.

Given a probability matrix $A$, the page ranks $\pi=(\pi_1,\pi_2,...,\pi_N)$ are the equilibrium probabilities such that $\pi A = \pi$. When all entries of $A$ are positive, the Perron-Frobenius theorem guarantees that these equilibrium probabilities are unique. If $A$ is also symmetric, its column sums are equal to row sums, and hence all equal to 1. Therefore, $\pi=(1/N)(1,1,...,1)$ is the unique probability vector that satisfies $\pi A = \pi$, i.e. all page ranks must be equal to each other.

Edit: An afternote. In Google's original ranking algorithm, the probability matrix $A$ is made entrywise positive by assuming that a person will have some probability $1-d$ of visiting a page according to the uniform distribution (hence he will visit each page $i$ with probability $1/N >0$), or probability $d$ of visiting a page according to the link structure of the whole internet. This is what the "90/10 rule" in the pdf file you mentioned refers to (with $d=0.9$), but in the first paper on the PageRank algorithm published by Sergey Brin and Lawrence Page, $d$ is actually $0.85$ and it is called a "damping factor" in the paper. I don't think Brin & Page were aware of the Perron-Frobenius theorem. They perhaps decided to add this damping factor after a number of trials and errors.

Related Question