How to calculate the C-index (an internal cluster validity index)? Please explain it with a small example. (I need the background calculation, i.e., how the pair of points in the cluster, minimum sum and maximum sum are used in the calculation).
Here is what I tried myself.
Let data x={1,2,3,4,5,6,7,8,9,10,11,12,13,14,15}
; using kmeans, k=3 i got {1,2,3,4},{5,6,7,8,9},{10,11,12,13,14,15}
are as clusters.
When I find the distance of x
1 2 3 4 5 6 7 8 9 10 11 12 13 14
2 0
3 0 0
4 0 0 0
5 1 1 1 1
6 1 1 1 1 0
7 1 1 1 1 0 0
8 1 1 1 1 0 0 0
9 1 1 1 1 0 0 0 0
10 1 1 1 1 2 2 2 2 2
11 1 1 1 1 2 2 2 2 2 0
12 1 1 1 1 2 2 2 2 2 0 0
13 1 1 1 1 2 2 2 2 2 0 0 0
14 1 1 1 1 2 2 2 2 2 0 0 0 0
15 1 1 1 1 2 2 2 2 2 0 0 0 0 0
15 1 1 1 1 2 2 2 2 2 0 0 0 0 0
and distance of clusters is
1 2 3 4 5 6 7 8 9 10 11 12 13 14
2 0
3 0 0
4 0 0 0
5 1 1 1 1
6 1 1 1 1 0
7 1 1 1 1 0 0
8 1 1 1 1 0 0 0
9 1 1 1 1 0 0 0 0
10 1 1 1 1 2 2 2 2 2
11 1 1 1 1 2 2 2 2 2 0
12 1 1 1 1 2 2 2 2 2 0 0
13 1 1 1 1 2 2 2 2 2 0 0 0
14 1 1 1 1 2 2 2 2 2 0 0 0 0
15 1 1 1 1 2 2 2 2 2 0 0 0 0 0
<code>
sum(dist(x))
[1] 560
sum(dist(y$cluster))
[1] 104
> max(sum(dist(y$cluster)))
[1] 104
> min(sum(dist(y$cluster)))
[1] 104
> cindex<-(560-140)/(140-140)
> cindex
[1] Inf
</code>
I got like this.
but when i am using the package of c-inedx in Rstudio it shows like this
<code>
> z<-intCriteria(x,y$cluster,c("c_index"))
Error in intCriteria(x, y$cluster, c("c_index")) :
argument 'traj' must be a matrix
> x<-c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15)
> x<-matrix(x,ncol=1)
> y<-kmeans(x,3)
> z<-intCriteria(x,y$cluster,c("c_index"))
> z
$c_index
[1] 0.04489796.
</code>
I want to know whether my above process is right is or wrong? Or please show me how to compute C-Index.
Best Answer
Below is an excerpt from my document on my SPSS macro function computing C-index internal clustering criterion [see my web-page, "Clustering criterions" collection]:
Let me now show how you can compute C-index of a cluster solution. You will need the distance matrix and the result of clustering. I'll use Iris data, 5 cases from each class, clustered by K-means (CLU = cluster code):
Calculation of C-Index.