Solved – Can we Apply K-means algorithm with Cosine similarity measure

clusteringcosine similarityhigh-dimensionalk-means

One of the most using algorithm for clustering text documents that represented by VSM is k-means this notes according to Agrawal(Text mining book 2016 2nd edition);

I would like to know if its possible to used Cosine Measure with K-means to measure similarity and clustering Docs that represented in high dimensional vectors (BOW),

so, Is possible to used Cosine similarity and what is the formula of this metric when applying under high dimensional representation to give an effort measures?

Best Answer

Cosine is equivalent to squared Euclidean distance on normalized data.

So yes, you can use k-means with cosine (see "spherical k-means"). Or you just scale your data to unit length, and use regular k-means.

Related Question