Solved – k-means clustering – Characterize clusters

clusteringk-means

I have a data set giving the number of visits for ~20 web pages for a total of ~3000 users. To indetify "similar" users according to the number of visits of each web page, I ran a k-means clustering.

I now know which user belongs to which of the k = 3 (k is irrelevant here) clusters. But how can I characterize the clusters? Is there a way to come to a conclusion similar to "User X belongs to the cluster of users, that like web pages about News and Politics."?

Best Answer

You used a single metric to classify the users into clusters? I'll assume you have additional, descriptive information about these events. One heuristic would be to run a summary of cluster central tendencies (e.g., means, medians, etc.) based on the cluster assignments across the descriptive information. So, if you have k=3 and x=20 (both k and x are irrelevant, x being the number of descriptors or features), then the output would create a 20 (rows) by 3 (columns) summary matrix for analysis. Next, to determine how the clusters differ on each descriptor, create an index based on the cluster value divided by the global value across all users for each descriptor. This index would be like an IQ score where 100 is "normal," 120+ and 80 or less indicating descriptors that are suggesting behaviors that diverge from the norm. 120+ and 80 or less are like "quick and dirty" significance tests for between group (clusters) differences.

Related Question