[Math] SVD : How is the normalized statistical leverage score a probability distribution

matricesprobability distributionssvd

I came across a paper related to the Singular Value Decomposition (SVD) where they calculate a normalized statistical leverage score for each column of a $m \times n$ matrix, It is defined as follows:
First they compute the SVD of a data matrix $X$. $X=U \Sigma V^T$. Then they choose the top $k<n$ columns from the right singular matrix $V$. Next, for each column $j=1,…,n$ of $X$ they calculate this quantity "normalized statistical leverage score" $\pi_j$ as follows:
$$\pi_j= \frac{1}{k} \sum_{i=1}^{k} v_{ji}^2 $$. So for every $j$th column they take the sum of the squares of the first '$k$' values in the $j$th right singular vector, and then divide that by $k$. They say that these two equalities hold. $$\pi_j >=0 $$ and, $$ \sum_{j=1}^{n}\pi_j =1$$ for all $j=1,…,n$. and hence the $\pi_j$s form a probability distribution over the $n$ columns. I am able to understand the first one since it is the sum of squares, so the value of each $\pi_j$ cannot be negative. But how does the second equality hold? How can that be proved?

Best Answer

Please note that $V$ is an orthogonal matrix. Then, the $i^{th}$ column $V^{i}$ of $V$ is such that the row sum: $\sum_{j} (V^{i}_{j})^{2} = 1$. Therefore, the additional sum over columns (the first k for a rank-k representation) and division by $k$ leads to $1$.