[Math] Understanding PageRank as an eigenvalue problem

eigenvalues-eigenvectorsnumerical methodspage-rank

In the book "A first course in numerical methods" by U. Arscher and C. Greif, chapter 8 on "Eingenvalues and singular values", example 8.1, we have:

Given a network linkage graph with $n$ nodes (webpages), the importance of a webpage is given by the number and the importance of pages that link to it.

Mathematically, suppose the importance (or rank) of page $i$ is given by $x_i$.

To determine the value of $x_i$, we first record all the pages that link to it. Suppose the locations, or indices, of these pages are given by the set $\{ B_i \}$.

If a webpage, whose index is $j \in B_i$, points to $N_j$ pages including page $i$, and its own rank is $x_j$, then we say that it contributes a share of $\frac{x_j}{N_j}$ to the rank $x_i$ of page $i$. Thus, in one formula, we set

$$x_i = \sum_{j \in B_i} \frac{1}{N_j} x_j$$

$i=1…n$

Looking carefully at this expression, we can see that in fact it is nothing but an eigenvalue problem!

We seek a vector $x$ such that $$x = Ax$$ where the nonzero values $a_{ij}$ of the matrix $A$ are the elements $\frac{1}{N_j}$ associated with page $i$.

In other words, we are seeking to compute an eigenvector of $A$ that corresponds to an eigenvalue equal to $1$.

Since the number of links in and out of a given webpage is smaller by many orders of magnitude than the overall number of webpages, the matrix $A$ is extremely sparse.

From what I understood, the vector $x$ basically contains as its entries the ranks of all webpages/nodes. But there are a few things I don't understand:

  1. Why do the authors say that it's an eigenvalue problem? How did they find $A$ from the expression above? I usually find the eigenvalues from the characteristic polynomial…

  2. $A$ seems to be a matrix that does not change the vector with which it's multiplied. How is it called this matrix? Are there any special properties of it?

  3. Why the nonzero values $a_{ij}$ of the matrix $A$ are the elements $\frac{1}{N_j}$ associated with page $i$? Wasn't $\frac{1}{N_j}$ associated with page $j$?

  4. "In other words, we are seeking to compute an eigenvector of $A$ that corresponds to an eigenvalue equal to $1$". Why this?

  5. I don't understand the relationship between $A$ being sparse and the fact that the number of in and out links from a webpage is considerably smaller than the number of webpages. This not-understanding could be due to the fact that I didn't really understood how we obtained $A$ and how it really looks.

I know these are a lot of questions, but they are really related to each other…

Best Answer

As $N_j$ is defined by the "... points to $N_j$ pages including page $i$", note that $N_j$ actually depends not only on $j$, but also on $i$. So if we call $\frac1{N_j}$ (related to the calculation of $x_i$) $a_{i,j}$, then the display equation precisely says $$x_i=\sum_j a_{i,j}x_j, $$ i.e., $$ x=Ax.$$

Recall that $x$ is called eigenvector of $A$ with eigenvalue $\lambda$ if $Ax=\lambda x$. Thus the above equation precisely says that $x$ is an eigenvector of $A$ for eigenvalue $1$.

Related Question