Solved – Applying Ward’s method for calculating linkage

dendrogramhierarchical clusteringward

For an assignment, I have used iPython to create the dendrogram below, using Ward's method and Euclidean distance, from the following data:
$$a=(0,0)$$
$$b=(1,2)$$
$$c=(3,4)$$
$$d=(4,1)$$
$$e=(2,2)$$
enter image description here
where dist({a},{b,e}) = 2.88, and dist({a,b,e},{c,d})=4.27.

How are these values derived? I've tried using the recursive method given here, but am not getting the same results.

Any help muchos appreciated.

I know that dist(a,b)=$\sqrt{5}$, dist(a,e)=$\sqrt{8}$, so using the formula on the wikipedia page, I have dist({a},{b,e})=$\frac{2}{3}\sqrt{5} + \frac{2}{3}\sqrt{8} – \frac{1}{3} = 3.04$.
Where am I going wrong?

Best Answer

Having banged my head on the wall for the last 2 hours on this, I feel your pain.

The result is the square root of the increase in within-cluster sum of squares (vs. cluster means), multiplied by $\sqrt{2}$ for some reason.

$$ \sqrt{ 2 \left( \sum_{i \in (C_1 \cup C_2)} \lVert \vec x_i - \bar x_{C_1 \cup C_2} \rVert^2 - \sum_{i \in (C_1)} \lVert \vec x_i - \bar x_{C_1} \rVert^2 - \sum_{i \in (C_2)} \lVert \vec x_i - \bar x_{C_2} \rVert^2 \right) } $$

For $C_1 = \{ a, b, e \}, C_2 = \{ c,d \}$ the result is $4.27$ as you require.

As for how exactly they are derived, depends on the specifics of the algorithm in use.