Solved – How to explain the connection between SVD and clustering

clusteringsvd

Is there an intuitive explanation for how SVD is related to co-clustering when performing SVD on a covariance matrix?

(i.e. the SVD is performed on the matrix $E[X Y^{\top}]$ where $X \in \mathbb{R}^n$ and $Y \in \mathbb{R}^m$.)

How does the projections $U$ and $V$ in the decomposition $U \Sigma V^{\top}$ relate to clustering? If I take some instance of $X$ and project it through $U$, what do I get?

(note I am talking about thin svd, so that $\Sigma$ is of dimensions lower than $n$ and $m$, and $U^{\top}x$ for some instance $x$ of $X$ is projected into that lower dimension.)

Best Answer

Perhaps this will help, taken from the Wikipedia article on PCA (PCA is very similar to SVD):

"Relation between PCA and K-means clustering

It has been shown recently (2001,2004) that the relaxed solution of K-means clustering, specified by the cluster indicators, is given by the PCA principal components, and the PCA subspace spanned by the principal directions is identical to the cluster centroid subspace specified by the between-class scatter matrix. Thus PCA automatically projects to the subspace where the global solution of K-means clustering lies, and thus facilitates K-means clustering to find near-optimal solutions."

The way I see it, dimensionality reduction is distinct from clustering but it can help clustering high-dimensional data by reducing to a "useful" low dimensional representation.