Solved – Clustering with shape prior

clusteringk-means

EDIT STARTS

After seeing the comments and answers, I believe I started in a wrong direction. I have a set of rectangles, which I want to cluster as shown below. enter image description here. The approach I took was to consider central points of each rectangle as a data point in $R^2$ and cluster them using Euclidean distance (K-means, K-mediods approach or any other method). I am not trying to discover "shapes" in the data, as I know the best shape would be rectangles. This is the reason I was trying to know if there is any way to include "shape priors" in the clustering methods. There might be algorithmic formulations of this problem, I would be glad if someone can point me to.

I hope it is clear now what I want to do, I will be glad if someone can point me to right directions.

EDIT ENDS

Suppose, we have a set of points in $R^d$ which we want to cluster. We do have some idea about the shape of the clusters, i.e., we know the clusters should be spherical, ellipsoidal or rectangular. Is there a way to include this "shape prior" in the clustering process? (it would be more useful for me if this has been done in some well known algorithms like k-means)

I did find some papers like this one and this one which talks about imposing some constraint into clustering, but the constraints are primarily in the form of "pairwise constraints" i.e. specifying whether two instances should be in same or different clusters.

Best Answer

There is the frequent claim that k-means "prefers" spherical clusters. Mathematically, it produces Voronoi cells, but there exists a close relationship between Voronoi cells, nearest neighbors and euclidean spheres.

In R, when you look at the Mclust function, you can specify which model to use. If you allow variances, it will model the data using spheres of different size, if you add in covariance they can also be rotated.

I'm not aware of an approach preferring rectangular clusters in any meaningful way. You could however use an R*-tree index, and use the index pages as clusters.

The question you didn't answer is: what are you trying to do?

If you have prerequirements such as rectangles, you are most likely not trying to discover structure, but instead you want to squeeze your data into a predefined model. At which point you are in the domain of constraint optimization and data modelling, not structure discovery (clustering).

When specifying such constraints, you should also talk about how to handle model overlaps and these things.