Solved – Understanding Singular Value Decomposition in the context of LSI

latent-semantic-indexingnatural languagersvd

My question is generally on Singular Value Decomposition (SVD), and particularly on Latent Semantic Indexing (LSI).

Say, I have $ A_{word \times document} $ that contains frequencies of 5 words for 7 documents.

A =  matrix(data=c(2,0,8,6,0,3,1,
                   1,6,0,1,7,0,1,
                   5,0,7,4,0,5,6,
                   7,0,8,5,0,8,5,
                   0,10,0,0,7,0,0), ncol=7, byrow=TRUE)
rownames(A) <- c('doctor','car','nurse','hospital','wheel')

I get the matrix factorization for $A$ by using SVD: $A = U \cdot D \cdot V^T $.

s = svd(A)
D = diag(s$d) # singular value matrix
S = diag(s$d^0.5 ) # diag matrix with square roots of singular values.

In 1 and 2, it is stated that:

$WordSim = U \cdot S$ gives the word similarity matrix, where the rows of $WordSim $ represent different words.

WordSim = s$u %*% S

$DocSim= S \cdot V^T$ gives the document similarity matrix where the columns of $DocSim$ represent different documents.

DocSim = S %*% t(s$v)

Questions:

  1. Algebraically, why are $WordSim$ and $DocSimS$ word/document similarity matrices? Is there an intuitive explanation?
  2. Based on the R example given, can we make any intuitive word count / similarity observations by just looking at $WordSim$ and $DocSim$ (without using cosine similarity or correlation coefficient between rows / columns)?

enter image description here

Best Answer

Matrix factorization using SVD decomposes the input matrix into three parts:

  • The left singular vectors $U$. The first column of this matrix specifies on which axis the rows of the input matrix vary the most. In your case, the first column tells you which words vary together the most.
  • The singular values $D$. These are scalings. These are relative to each other. If the first value of $D$ is twice as big as the second it means that the first singular vector (in $U$ and $V^T$) explain twice as much variation as the seconds singular vector.
  • The right singular vectors $V^T$. The first row of this matrix specifies on which axis the columns of the input matrix vary the most. In your case, the first row tells you which documents vary together the most.

When words or documents vary together it indicates that they are similar. For example, if the word doctor occurs more often in a document, the word nurse and hospital also occur more. This is shown by the first scaled left singular vector, the first column of $WordSim$.You can validate this result by looking at the input data. Notice that when nurse does occur, hospital also occurs and when it does not occur, hospital also does not occur.

Related Question