Solved – Clustering very small datasets

clustering

I am looking for methods to cluster very small datasets.
Almost all methods I have seen talk about how well they work on very large datasets.

By small I am talking 5 elements, 20elements, maybe 50 elements.
Particularly focused on 20 elements.

Are there some standard methods I am not seeing?

20 elements is just about small enough that it would be viable to brute force it.
It seems like it is certain to be able to use some method based on mixed integer programming.


To give specifics about my particular problem

I have, what I will call "models", and each set of models is about 20 elements. The models are what I want to cluster.
I have about 3000 sets of models to cluster, each with about 20 elements.
Each model is made up of two things: An ID (which links to other useful information), and a probability function.
That probability function takes in some data and tells me how likely, according to this model, that data is.

When using the collection of models, I assess the data with each of the models, and then chose the model that gives the highest probability as one that best fits this particular price of data.

I initially start with a lot of models which are more or less random in their quality, but which are improved by a separate system to get better and better at modeling particular types of data (the type of data which they currently model best).
Often the two (or more) models may become good at modelling the same data.
So I want to use clustering to throw out duplicated.

So I evaluated over a dataset all the models, and then use the results to determine my distence function between the models.

I am currently investigating measures including Correlation between the sets of probabilities output for same point,
and also the "Cost to replace", that is how much the total probability of all the data sets with this model being the best would go down if one of the other models was used instead. If when model $i$ is best, I could instead use model $j$ and not loose much probability, then $i$ and $j$ must be generally pretty similar (I have to make this symmetric by adding the transpose).

I do not have (or rather do not want to use) an a priori data about the likely number of clusters? But given the maximum number of clusers is one per element, with K-* type clustering it really doesn't take too long to evaluate all values of K.

I've been playing around a lot with affinity propagation and k-meniods.
Just starting to play with hierarchical clustering now.

Best Answer

For tiny data sets, hierarchical clustering is the method of choice.

The dendrogram visualization allows you to visually verify how well the data clusters, if there are outliers, how clusters nest, and how many clusters exist.

Related Question