What`s the name of this clustering algorithm

clusteringmachine learning

I have been learning the fuzzy clstering algorithm recently,and I got an object function as following:

\begin{array}{l}
\min \;\;J = \sum\limits_{i = 1}^N {\sum\limits_{k = 1}^K {\sum\limits_{j = 1}^N {{u_{ik}}{d_{ij}}{u_{jk}}} + T\sum\limits_{i = 1}^N {\sum\limits_{k = 1}^K {u_{ik}^2} } } } \\
\text{s.t. }\;\;\sum\limits_{k = 1}^K {{u_{ik}}} = 1,\;\;{u_{ik}} \ge 0\\\\
N: \text{ number of samples}\\
K: \text{ number of clusters}\\
{u_{ik}}: \text{ the fuzzy membership, which represents that the degree of object } i\\\qquad\,\text{ belongs to cluster }k\\
{d_{ij}}: \text{the distance of object $i$ and object }j\;\\
T: \text{ parameter,a constant}
\end{array}

I am interested in this algorithm but I have no idea what is the name of it.The only thing I know about it is that it`s a fuzzy clustering algorithm. It looks like an variant of the fuzzy c-means clustering algorithm and K-medoids?

What I have already known is the meaning of the first item in the formula is the total dissimilarity within each cluster.However,I can not figure out the reason of it…

I searched via the Internet however I can`t find anything looks like this formula.I will appreciate it if you could help me or give me a hint.

——————————-EDIT——————————————-

a paper related founded.

I found a paper Incremental Fuzzy Clustering With Multiple Medoids for Large Data. The first two items of its objective function is very similiar to that in my question:
The objective function and some details of the above paper

Is it appropriate to think in this way that the Vcj of the paper is similiar to Ujk of my question in meaning?

In other words,the fuzzy membership Ujk in my question also reflects the degree of the representative of object j for cluster k. The higher the value, the better this object can represent this cluster?

Best Answer

This is just a soft-assignment or probablistic $k$ means.

I don't find anything on the web to help you more right now, but it is typically done with some kind of Expectation-Maximization method.

EDIT to answer the comments

The empirical idea is the following: assume first that your membership degrees are hard assignments i.e. that they take values in $\{0,1\}$. In this case, you consider the within class similarities only for those values that actually belong to a certain class. The first term in your objective function may be written as $$ \sum_{i = 1}^N \sum_{k = 1}^K \sum_{j = 1}^N u_{ik}d_{ij}u_{jk} = \sum_{k = 1}^K \sum_{i = 1}^N \sum_{j = 1}^N u_{ik}d_{ij}u_{jk} = \sum_{k = 1}^K \sum_{i \in \mathcal{C}_k} \sum_{j \in \mathcal{C}_k} d_{ij}, $$ where $\mathcal{C}_k$ denotes the set of indices corresponding to the elements (hard) assigned to cluster $k$. So if we continue, you can consider the within class similarity of a cluster $k$ as $D_k := \frac{1}{2}\sum_{i \in \mathcal{C}_k}\sum_{j \in \mathcal{C}_k} d_{ij}$ (note the $\frac{1}{2}$ as all pairwise components are taken twice into account, but this is not really an issue.)

Hence, in case of hard assignments, by minimizing your objective function, you are actually minimizing the total within class distance: $$ J = \sum_{k}D_k. $$

Assume now that we are dealing with soft assignments. Soft assignments is pretty much an embedding of how much uncertainty we have in our assignments. In other words, we are not sure whether a certain input point $i$ should belong to class $k$ or class $l$. To this end, I will write in terms of probability, as to me, membership degrees are nothing other than probabilities (this has been a massive subject of discussion in my former group, heavy on fuzzy logic!)

Now let $p_{il}$ denotes the probability that point $i$ belongs to cluster $l$. I will write your objective function a little bit differently $$ J = \sum_{k = 1}^K \sum_{i = 1}^N p_{ik} D_{ik}, $$ where $D_{ik} = \sum_{j = 1}^N p_{jk}d_{ij}$ is the contribution of $i$ to cluster $k$.

In this case, if $p_{ik}$ is small (i.e. you don't think that point $i$ should be part of cluster $k$), than you don't have this point contribute too much to the global within class similarities, therefore, you give a lower weight to the distances between point $i$ and all the other points $j$, when looking at cluster $k$. Similarly, if some point $j$ shouldn't be part of cluster $k$ (i.e. $p_{jk}$ is small) than they shouldn't contribute to the within class distances too much either.

For the sake of understanding, consider two points $1$ and $2$ which you strongly believe to belong to cluster $1$ with probability / belief $.9$ and only with belief $.1$ to cluster $2$. $d_{ij}$ will be accounted with weight $.81$ (i.e. a strong weight) to $D_1$ and only with weight $.01$ to $D_2$.

Usually, you would want to minimize the within class distances while maximizing the outer class distances.

Best Answer

Related Solutions

Division by $0$ Extreme Case in Fuzzy C-Means Clustering

Why does spectral clustering work

Why this heuristics

Why 1 vector work

Approximating the ideal case

Why starting from 2nd eigenvector

Related Question