Could there be any situation where k-means and fuzzy clustering algorithm be combined?
MATLAB: Is it possible to combine k-means and fuzzy clustering algorithm
fuzzy clustering algorithmk-means
Related Solutions
Not really, no. Clusters, especially as determined by k-means, do not have "features". The k-means documentation indicates,
This iterative partitioning minimizes the sum, over all clusters, of the within-cluster sums of point-to-cluster-centroid distances. Rows of X correspond to points, columns correspond to variables.
If you were working with two variables (two columns) then a geometric interpretation would be that any given point P is considered to be associated with a centroid C if the circle of radius |PQ| around P does not (strictly) include any other centroid.
Likewise you can extend to three variables (three columns) and extend the geometric interpretation to be "sphere" instead of "circle".
You can see, then, that the cluster associated with C does not have any particular value or even value range for the variables. The boundary between clusters would be the intersection of hyperspheres around the centroids.
If you were using SVM, and you had only two clusters, then you could find a hyperplane that divided the two clusters -- but a hyperplane is not what one would typically think of as being a "feature".
The silhouette value for a point is supposed to give a measure of how close the point is to its own cluster versus how close it is to other clusters. For non-singleton clusters a value of 0 for a point would mean that the point is on the border between its own cluster and a neighboring cluster. This interpretation would no longer make sense for a singleton cluster. On the other hand if there is only a single point in a cluster you could consider that that cluster is defined by that single point, so even though you cannot find the distance from that point to other points in the same cluster, it makes sense that it belongs in that cluster more than any other cluster.
Another justification is that silhouette plots are often used for analysis of quality of clustering. Typically you would want the silhouette scores in each cluster to be near the average silhouette score of all points. A silhouette score of 0 will be below the average for any reasonable clustering, so one could conclude that the clustering was bad even if having a singleton cluster is completely reasonable (one point very far away from everything else).
Best Answer