Latent Semantic Analysis – Computing Document Similarity Using Latent Semantic Analysis

clusteringdata mininglatent-semantic-analysis

I have a question regarding Latent Semantic Analysis – after performing SVD decomposition of term-document matrix and choosing some number of dimensions, I get the set of new document vectors.

Now, how can I calculate similarity between two documents? New document vectors contain negative values, and results produced by cosine similarity make no sense.

Best Answer

It is normal for the new document vectors to contain negative values. The new dimensions correspond to concepts (though incomprehensible) in the lower dimensional space and a negative value means that the corresponding document is not related with that concept.

What do you mean by "results produced by cosine similarity make no sense" Cosine similarity should work fine. You can also try Pearson correlation (centered version of cosine).