Solved – n opposite to Clustering or “Anti-“Clustering

classificationclustering

In Clustering we want to find optimal Clusters – data points that are close together (measure by a defined distance measure). Is there any "Anti-"Clustering? We have a set of data points and want to find "Anti-"Clusters – subsets with data points that are as far apart as possible (measure by a defined distance measure, for example euclidean distance).

In a concrete application I have a set of people and want to divide these people into n groups as diverse as possible measure with a few predefined criteria (diverse means: as less as possible of the same values of each criteria in each group).

The first time I thought about this it seemed pretty intuitive but I couldn't find any method to create these subsets, maybe I'm to focused on regular Clustering methods. Sorry, if this question is too trivial, but any help would be appreciated!

Best Answer

Based on the poster's response in one of the answers, I will rephrase the problem statement because it can be solved using an R package that I wrote:

The task is to partition a set of elements into K groups such that the distances between clusters is minimized and the distance within clusters is maximized. This is mathematically the opposite of clustering and has indeed been called anticlustering; there are some rarely-cited papers on this approach (Späth 1986; Valev 1998). Generally, anticlustering leads to clusters that are similar to each other.

If you are using R, you can use my package anticlust to tackle the anticlustering problem. For example, use the following code to create three similar sets of plants in the classical iris data set:

library(anticlust)
data(iris)

## Maximize the k-means criterion
anticlusters <- anticlustering(
  iris[, -5],
  K = 3,
  objective = "variance"
)

## Compare feature means by anticluster
by(iris[, -5], anticlusters, function(x) round(colMeans(x), 2))
#> anticlusters: 1
#> Sepal.Length  Sepal.Width Petal.Length  Petal.Width 
#>         5.84         3.06         3.76         1.20 
#> --------------------------------------------------------------------------------------- 
#> anticlusters: 2
#> Sepal.Length  Sepal.Width Petal.Length  Petal.Width 
#>         5.84         3.06         3.76         1.20 
#> --------------------------------------------------------------------------------------- 
#> anticlusters: 3
#> Sepal.Length  Sepal.Width Petal.Length  Petal.Width 
#>         5.84         3.06         3.76         1.20

My package can maximize two clustering objectives: (a) the classical k-means objective, leading to similar feature means; (b) the cluster editing objective, which is the sum of the pairwise distances within clusters. In the package, you can vary the parameter objective between "variance" (k-means) or "distance" (cluster editing).

The Github page and the package docs contains some more information on the methods and algorithms used in the package.

Späth, H. (1986). Anticlustering: Maximizing the variance criterion. Control and Cybernetics, 15(2), 213–218.

Valev, V. (1998). Set partition principles revisited. In Joint IAPR international workshops on statistical techniques in pattern recognition (SPR) and structural and syntactic pattern recognition (SSPR) (pp. 875– 881).

Best Answer

Related Solutions

Solved – Clustering quality measure

Solved – Weighted spatial clustering

Related Question