MATLAB: How to visualize high-dimensional clusters from the “kmeans” function

high-dimensionalStatistics and Machine Learning Toolboxvisualization

I applied the "kmeans" function to a dataset of 24 variables with the number of clusters being set to 3. How can I visualize the three clusters and their centroids?

Best Answer

Because the cluster data is 24-dimensional, it is often difficult to visualize them directly. A common way to deal with this is to first project or transform the data to lower dimensions (typically 2 or 3) and then apply visualization techniques to the reduced-dimensional data. As an example, suppose the "kmeans" function is applied to a data matrix "data" (300 x 24) with the number of clusters being set to 3:

data = randn(300, 24);
[idx, C] = kmeans(data, 3);

Then here are some visualization options:

   Option 1: Plot 2 or 3 dimensions of your interest. For instance, to plot the 4th dimension versus the 9th dimension of your data, one can do the following
scatter(data(:,4), data(:,9), [], idx);   % plot three clusters with different colors
hold on;
plot(C(:, 4), C(:, 9), 'kx');   % plot centroids
   Option 2: First reduce the dimensionality of your data using principal component analysis (PCA), and then plot the data in the principal-component space:
[standard_data, mu, sigma] = zscore(data);     % standardize data so that the mean is 0 and the variance is 1 for each variable
[coeff, score, ~]  = pca(standard_data);     % perform PCA
new_C = (C-mu)./sigma*coeff;     % apply the PCA transformation to the centroid data
scatter(score(:, 1), score(:, 2), [], idx)     % plot 2 principal components of the cluster data (three clusters are shown in different colors)
hold on
plot(new_C(:, 1), new_C(:, 2), 'kx')     % plot 2 principal components of the centroid data

​​ Option 3: Use "silhouette" function to measure the goodness of the clustering:

silhouette(data, idx);
