Solved – the meaning of the eigenvectors of a mutual information matrix

eigenvaluesentropymutual informationpca

When looking at the eigenvectors of the covariance matrix, we get the directions of maximum variance (the first eigenvector is the direction in which the data varies the most, etc.); this is called principal component analysis (PCA).

I was wondering what it would mean to look at the eigenvectors/values of the mutual information matrix, would they point in the direction of maximum entropy?

Best Answer

While it is not a direct answer (as it is about pointwise mutual information), look at paper relating word2vec to a singular value decomposition of PMI matrix:

We analyze skip-gram with negative-sampling (SGNS), a word embedding method introduced by Mikolov et al., and show that it is implicitly factorizing a word-context matrix, whose cells are the pointwise mutual information (PMI) of the respective word and context pairs, shifted by a global constant. We find that another embedding method, NCE, is implicitly factorizing a similar matrix, where each cell is the (shifted) log conditional probability of a word given its context. We show that using a sparse Shifted Positive PMI word-context matrix to represent words improves results on two word similarity tasks and one of two analogy tasks. When dense low-dimensional vectors are preferred, exact factorization with SVD can achieve solutions that are at least as good as SGNS’s solutions for word similarity tasks. On analogy questions SGNS remains superior to SVD. We conjecture that this stems from the weighted nature of SGNS’s factorization.