The distance between the two clusters is the maximum between the two clusters. Obviously, $8$ and $-5$ are the furthest in your scenario. For instance, $D(8,0)$ = |8-0| = only 8.
Any set in which you can define a 'distance' function which satisfies a few properties (distances are positive, symmetric, and additive). Is called a Metric space. $\mathbb{R}^k$ is a metric space with the distance function typically defined to be $d(\mathbf{x},\mathbf{y}) = |\mathbf{x}-\mathbf{y}|$, the norm of the difference (although we can use whatever distance function we want as long as it satisfies the 3 properties, more on that later).
The norm is defined to be $|\mathbf{x}| = \sqrt{\sum_{i=1}^n x_i^2}$. That right there looks strangely familiar you might think. So if you have some observed values $\mathbf{x}=x_1,\ldots,x_n$ and if we find the distance between your observed values and their mean, $\mu$ we have $d(\mathbf{x},\mu) = |\mathbf{x}-\mu| = \sqrt{\sum_{i=1}^n (x_i-\mu)^2}$ which is almost like the standard deviation (missing a $1/n$ or $1/(n-1)$. However, we can easily redefine our distance function to be something like $d(\mathbf{x},\mathbf{y}) = +\sqrt{1/n}|\mathbf{x}-\mathbf{y}|$ and it will still have the three properties required to make $\mathbb{R}^k$ a metric space.
You might be more familiar with distances in a 2-dimensional space like $\mathbb{R}^2$. In this space we can use the same distance function as above, but since instead of $k$ components we have only 2 the formula simplifies to $d((x_1,y_1), (x_2,y_2)) = \sqrt{(x_1-x_2)^2+(y_1-y_2)^2}$
K-means is all about the analysis-of-variance paradigm. ANOVA - both uni- and multivariate - is based on the fact that the sum of squared deviations about the grand centroid is comprised of such scatter about the group centroids and the scatter of those centroids about the grand one: SStotal=SSwithin+SSbetween. So, if SSwithin is minimized then SSbetween is maximized.
SS of deviations of some points about their centroid (arithmetic mean) is known to be directly related to the overall squared euclidean distance between the points: the sum of squared deviations from centroid is equal to the sum of pairwise squared Euclidean distances divided by the number of points. (This is the direct extension of the trigonometric property of centroid. And this relation is exploited also in the double centering of distance matrix.)
Thus, saying "SSbetween for centroids (as points) is maximized" is alias to say "the (weighted) set of squared distances between the centroids is maximized".
Note: in SSbetween each centroid is weighted by the number of points Ni in that cluster i. That is, each centroid is counted Ni times. For example, with two centroids in the data, 1 and 2, SSbetween = N1*D1^2+N2*D2^2 where D1 and D2 are the deviations of the centroids from the grand mean. That where word "weighted" in the former paragraph stems from.
Example
Data (N=6: N1=3, N2=2, N3=1)
V1 V2 Group
2.06 7.73 1
.67 5.27 1
6.62 9.36 1
3.16 5.23 2
7.66 1.27 2
5.59 9.83 3
SSdeviations
V1 V2 Overall
SSt 37.82993333 + 51.24408333 = 89.07401666
SSw 29.50106667 + 16.31966667 = 45.82073333
SSb 8.328866667 + 34.92441667 = 43.25328333
SSt is directly related to the squared Euclidean distances between the data points:
Matrix of squared Euclidean distances
.00000000 7.98370000 23.45050000 7.46000000 73.09160000 16.87090000
7.98370000 .00000000 52.13060000 6.20170000 64.86010000 45.00000000
23.45050000 52.13060000 .00000000 29.02850000 66.52970000 1.28180000
7.46000000 6.20170000 29.02850000 .00000000 35.93160000 27.06490000
73.09160000 64.86010000 66.52970000 35.93160000 .00000000 77.55850000
16.87090000 45.00000000 1.28180000 27.06490000 77.55850000 .00000000
Its sum/2, the sum of the distances = 534.4441000
534.4441000 / N = 89.07401666 = SSt
The same reasoning holds for SSb.
Matrix of squared Euclidean distances between the 3 group centroids (see https://stats.stackexchange.com/q/148847/3277)
.00000000 22.92738889 11.76592222
22.92738889 .00000000 43.32880000
11.76592222 43.32880000 .00000000
3 centroids are 3 points, but SSb is based on N points (propagated centroids):
N1 points representing centroid1, N2 points representing centroid2 and N3 representing centroid3.
Therefore the sum of the distances must be weighted accordingly:
N1*N2*22.92738889 + N1*N3*11.76592222 + N2*N3*43.32880000 = 259.51969998
259.51969998 / N = 43.25328333 = SSb
Moral in words: maximizing SSb is equivalent to maximizing
the weighted sum of pairwise squared distances between the centroids.
(And maximizing SSb corresponds to minimizing SSw, since SSt is constant.)
Best Answer
$D$ is a linkage criteria, it is known as complete linkage. Goto Wikipedia:
https://en.wikipedia.org/wiki/Hierarchical_clustering#Metric
and you should see this:
Does the formula look similar to yours?
The distance between the two clusters is the maximum between the two clusters. Obviously, $8$ and $-5$ are the furthest in your scenario. For instance, $D(8,0)$ = |8-0| = only 8.