K-Means Clustering – How to Derive a K-Means Objective Function in Matrix Form?

clusteringk-means

If $X = \{ x_1,\cdots x_n\}$ is a set of feature vectors, then the k-means algorithm tries to minimize the objective function $O = \sum_{i=1}^{k}\sum_{x \in G_i}||x -\mu_i ||^2$ in order to cluster $n$ feature vectors into $k$ clusters, namely $G_1 \cdots G_k$. Here $\mu_i$ is the centroid of cluster $G_i$.

Now I represent the set of feature vectors as a feature matrix $F_{m \times n}$, where $m$ is the feature dimension and $n$ is the number of objects and set of centroids as a Centroid matrix $C_{m \times k}$.

How can we represent the above k-means objective function in terms of these matrix notations (feature and cluster centroid matrix)?

Best Answer

Given an $m$ by $n$ matrix $X$, the algorithm seeks to group its $n$ columns, thought of as $m$-vectors, into a specified number of groups, $k$. This can be represented by an $n$ by $k$ matrix $A$ having entries in $\{0,1\}$ and one column for each of the $k$ groups. Column $j$ indicates which vectors in $X$ belong to group $j$; that is, $a_{ij} = 1$ if and only if column $i$ of $X$ is assigned to group $j$.

Let $1_k$ be the column vector of $k$ 1's and $1_n$ the column vector of $n$ 1's. $A$ is constrained to satisfy $A\ 1_k = 1_n^{'}$, reflecting the assignment of each column of $X$ to exactly one group.

The $m$ by $k$ matrix whose columns are the group centroids can be constructed as

$$C = X\ A\ \textrm{diagonal}(1_n^{'}\ A)^{-1}.$$

The distances between the columns of $X$ and their associated centroids $C\ A^{'}$ are

$$D = X - C\ A^{'},$$

also an $m$ by $n$ matrix, whence the objective function can be expressed as the number

$$tr(D^{'}\ D)$$

(which is the sum of squares of the entries of $D$).

For instance, consider forming two clusters of the points $(1,0), (-1,0), (0,2), (0,3), (0,4)$ in the plane ($k=2$, $m=2$, $n=5$). Then we can let

$$X = \left( \begin{array}{ccccc} 1 & -1 & 0 & 0 & 0 \\ 0 & 0 & 2 & 3 & 4 \end{array} \right).$$

To assign the first two points to the first cluster and the last three points to the second cluster, set

$$A = \left( \begin{array}{cc} 1 & 0 \\ 1 & 0 \\ 0 & 1 \\ 0 & 1 \\ 0 & 1 \end{array} \right).$$

The centroids of these clusters are $\left((1,0)+(-1,0)\right)/2 = (0,0)$ and $\left((0,2)+(0,3)+(0,4)\right)/3 = (0,3)$, respectively, whence

$$C = X\ A\ \textrm{diagonal}(2,3)^{-1} = \left( \begin{array}{cc} 0 & 0 \\ 0 & 3 \end{array} \right).$$

Thus

$$C\ A^{'} = \left( \begin{array}{ccccc} 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 3 & 3 & 3 \end{array} \right)$$

(the columns give the centroids associated with the columns of $X$) and

$$D = X - C\ A^{'} = \left( \begin{array}{ccccc} 1 & -1 & 0 & 0 & 0 \\ 0 & 0 & -1 & 0 & 1 \end{array} \right)$$

and, finally, the value of the objective function equals the sum of squares of its entries, $4$.