There exist different ways of handling noise. If I recall correctly, this is discussed in the DBCV paper (density-based cluster validation). The ELKI clustering toolkit has an option of how to handle noise clusters during evaluation.
I am not convinced by these measures. I believe that with a trivial postprocessing you can "optimize" your clustering for most metrics (e.g. assigning noise to their nearest cluster will improve silhouette) without theoretical support or any practical usefulness.
In my opinion,clustering needs to be treated as an explorative technique: it does not matter if it can improve some useless statistical score. The only thing that matters is, if it allows a human to better understand data.
For example, the Calinski-Harabasz variance ratio criterion (VRC) is fairly standard.
Calinski, T., and J. Harabasz. “A dendrite method for cluster analysis.” Communications in Statistics. Vol. 3, No. 1, 1974, pp. 1–27.
But there are many many more, such as C index, DBCV, etc.
I believe some of the indexes even had a dozen variants.
The Dunn index is essentially the ratio separation/compactness, while davies-bouldin is a compactness/separation. So I guess you are suggesting just one of the many variants of these two.
Note that if you have many clusters, it is better to only consider the nearby neighbor clusters, and not the average distance to all others! Assuming you have one very badly split cluster, but extremely well separated from the majority of the data, the naive within/inbetween quotient will fail. That is why you usually define separation based in the nearest other cluster(s) only, instead of the entire data set.
It just shows once more that you cannot rely on Wikipedia alone (and too many people and even books just copy from Wikipedia only...)
But beware that all these are just heuristics. You can find counterexamples for each, I suppose.
Best Answer
External validity indices are used when you propose a new clustering technique and you want to validate it or you want to compare it to existing techniques. In these cases, you get a bunch of datasets for which you know the ground truth and see if your clustering technique is able to produce clustering solutions that are similar to it.