I have this distances:
install.packages("HSAUR");library(HSAUR)
b=round(scale(pottery[1:6,1:5], center=FALSE),2)
b1=round(dist(b),2)
> round(dist(b,method="euclidean"),2)
1 2 3 4 5
2 0.34
3 0.24 0.14
4 0.36 0.10 0.16
5 0.33 0.15 0.19 0.24
6 0.45 0.43 0.40 0.47 0.45
Considering the clusters $A=\{1,4\}$ and $B=\{2,3\}$
-
What is a correct approach for doing this with complete linkage distances?
-
How do I merge the cases into a cluster and how can I compute the average distances between these two clusters?
For the complete linkage maximum distance, I have first to merge the cases into clusters and put their distances based on the formula:
$d(A,B) = max |x-y|$
So I have this example to follow:
http://tx.shu.edu.tw/~PurpleWoo/Literature/!DataAnalysis/Methods%20of%20Multivariate%20Analysis.pdf
It starts on page 460, the example 14.3.3(a). The process it follows leads to a nonsymmetric table of distances starting from the fourth step and on. Following the first steps I get to the conclusion that the complete linkage distance for this two clusters could be 0.34, but I am not sure and also I wouldn't be able to finish the process to know how a dendrogram would look like all by myself, and also I would be completely lost when doing so for the average linkage process.
Best Answer
You have your distance definitions mixed up.
$$d_{\max}(x,y) := \max_d |x_d-y_d|$$
is the maximum norm for vectors.
What you need for complete linkage is a different kind of distance, one that is defined on clusters in terms of point distances:
$$ D_{\text{complete-linkage}}(A,B) := \max_{a\in A}\max_{ b\in B} d(a,b) $$
where $d(a,b)$ could be any distance, including above maximum norm, but also Euclidean distance.