Solved – How to interpret the results of LSA

classificationinterpretationlatent-semantic-analysisMATLAB

I implemented LSA (Latent Semantic Analysis) on MATLAB.
I have a $D\times N$ term-document matrix, where $D$: # of words, $N$: # of docs.
I did low-rank approximation using SVD, and got $$X_k = U_k \cdot S_k \cdot V_k'
(D=1000, N=600, K=4)$$

Now I want to classify the documents into 4 classes,
and I know I have to use the col-vectors of $V_k'$.

Each column of it has 4 values.
I think each value indicates how much the document is related to the topic (in latent spaces). Am I right?

But when I see the column's value, it has both positive and negative values.
How can I interpret it?

Best Answer

In order to interpret LSA output, you need to remember that it uses a cosine measure of similarity. It means that you are measuring similarity between two vectors using the cosine of their angles (if the angle is zero, we have maximum similarity).

However, if you want to know if those negative values appeared for each topic is a sign of similarity, you must rethink about your problem. Fabian Zehner explains that here with an example of a Tennis game, I suggest you to read that. But to give you a quick tip, you must think your similarity with four quadrants of possibilities in a document-word plan.