Solved – the meaning of the eigenvectors of a mutual information matrix

eigenvaluesentropymutual informationpca

When looking at the eigenvectors of the covariance matrix, we get the directions of maximum variance (the first eigenvector is the direction in which the data varies the most, etc.); this is called principal component analysis (PCA).

I was wondering what it would mean to look at the eigenvectors/values of the mutual information matrix, would they point in the direction of maximum entropy?

Best Answer

While it is not a direct answer (as it is about pointwise mutual information), look at paper relating word2vec to a singular value decomposition of PMI matrix:

O. Levy, Y. Goldberg, Neural Word Embedding as Implicit Matrix Factorization

We analyze skip-gram with negative-sampling (SGNS), a word embedding method introduced by Mikolov et al., and show that it is implicitly factorizing a word-context matrix, whose cells are the pointwise mutual information (PMI) of the respective word and context pairs, shifted by a global constant. We find that another embedding method, NCE, is implicitly factorizing a similar matrix, where each cell is the (shifted) log conditional probability of a word given its context. We show that using a sparse Shifted Positive PMI word-context matrix to represent words improves results on two word similarity tasks and one of two analogy tasks. When dense low-dimensional vectors are preferred, exact factorization with SVD can achieve solutions that are at least as good as SGNS’s solutions for word similarity tasks. On analogy questions SGNS remains superior to SVD. We conjecture that this stems from the weighted nature of SGNS’s factorization.

Related Solutions

Solved – Eigenvectors of a covariance matrix with only positive elements

With your new information, that all the components of the positive-definite matrix are positive, it becomes easy. While it follows directly from the Perron-Frobenius theorem (which is valid for square matrices with non-negative elements, symmetric or not), in the symmetric case it is much easier.

Let the positive-definite matrix be $S$. The eigenvector corresponding to the largest eigenvector is the vector $x$ obtaining the maximum in the following problem: $$ \lambda_{\mathrm{max}} = \mathrm{max}_{\{x \colon \| x\|=1\}} x^T S x $$(that is, the "argmax") where $\lambda_{\text{max}}$ is the largest eigenvalue.

Suppose to get a contradiction that $x_1$ is negative, while the other components of $x$ are non-negative. We can write $$ x^T S x = x_1 S_{11} x_1+2x_1 \sum_{j=2}^m s_{1j} x_j + \sum_{i=2}^m \sum_{j=2}^m x_i s_{ij} x_j $$ Note that the first and third terms are positive while the second term is negative, and we can get a strictly larger value by switching the sign of $x_1$, which respects the restriction on norm. That gives the contradiction you need. A similar argument can be written for any other pattern of negative/positive sign.

Solved – How to get the principal components of one matrix along the principal directions of another matrix

You get the coefficients from PCA. These coefficients are multiplied by your observation matrix to obtain the components. So, multiply rotation by the new observation matrix instead. Don't forget to center it.

Here's the code.

Run PCA and see how the score matrix is obtained from the original data and the rotation. Note, that I'm NOT centering, and you probably should.

> x=matrix(c(1,2,3,2,4,5.5),3,2)
> x
     [,1] [,2]
[1,]    1  2.0
[2,]    2  4.0
[3,]    3  5.5
> r=prcomp(x,retx=1,center=FALSE)
> r$rotation
                PC1        PC2
    [1,] -0.4666132  0.8844615
    [2,] -0.8844615 -0.4666132
    > r$x
           PC1         PC2
[1,] -2.235536 -0.04876479
[2,] -4.471072 -0.09752958
[3,] -6.264378  0.08701220
> x %*% r$rotation
           PC1         PC2
[1,] -2.235536 -0.04876479
[2,] -4.471072 -0.09752958
[3,] -6.264378  0.08701220

Now, apply the same rotation to the different data (again, see that I am NOT centering).

> y=matrix(c(1,2,3,2,4,6.5),3,2)
> y
     [,1] [,2]
[1,]    1  2.0
[2,]    2  4.0
[3,]    3  6.5
> y %*% r$rotation
           PC1         PC2
[1,] -2.235536 -0.04876479
[2,] -4.471072 -0.09752958
[3,] -7.148839 -0.37960095

Note the similarity of the new scores.

By the way, this is used a lot in forecasting with PCA. We obtain the rotation on historical data, then apply it to new data.

Best Answer

Related Solutions

Solved – Eigenvectors of a covariance matrix with only positive elements

Solved – How to get the principal components of one matrix along the principal directions of another matrix

Related Question