Solved – Distance between two clusters

clusteringdendrogramhierarchical clusteringself-study

I have this distances:

install.packages("HSAUR");library(HSAUR)
b=round(scale(pottery[1:6,1:5], center=FALSE),2)
b1=round(dist(b),2)

> round(dist(b,method="euclidean"),2)
     1    2    3    4    5
2 0.34                    
3 0.24 0.14               
4 0.36 0.10 0.16          
5 0.33 0.15 0.19 0.24     
6 0.45 0.43 0.40 0.47 0.45

Considering the clusters $A=\{1,4\}$ and $B=\{2,3\}$

  1. What is a correct approach for doing this with complete linkage distances?

  2. How do I merge the cases into a cluster and how can I compute the average distances between these two clusters?

For the complete linkage maximum distance, I have first to merge the cases into clusters and put their distances based on the formula:

$d(A,B) = max |x-y|$

So I have this example to follow:

http://tx.shu.edu.tw/~PurpleWoo/Literature/!DataAnalysis/Methods%20of%20Multivariate%20Analysis.pdf

It starts on page 460, the example 14.3.3(a). The process it follows leads to a nonsymmetric table of distances starting from the fourth step and on. Following the first steps I get to the conclusion that the complete linkage distance for this two clusters could be 0.34, but I am not sure and also I wouldn't be able to finish the process to know how a dendrogram would look like all by myself, and also I would be completely lost when doing so for the average linkage process.

Best Answer

You have your distance definitions mixed up.

$$d_{\max}(x,y) := \max_d |x_d-y_d|$$

is the maximum norm for vectors.

What you need for complete linkage is a different kind of distance, one that is defined on clusters in terms of point distances:

$$ D_{\text{complete-linkage}}(A,B) := \max_{a\in A}\max_{ b\in B} d(a,b) $$

where $d(a,b)$ could be any distance, including above maximum norm, but also Euclidean distance.

Related Question