MATLAB: What happens if I use the CLUSTERDATA function with the ‘ward’ method and the ‘cosine’ measure in Statistics Toolbox 8.1 (R2012b)

Statistics and Machine Learning Toolbox

Reading the MATLAB documentation for the function CLUSTERDATA it says that the 'ward' method is defined for Euclidean distances only. However, if I run the CLUSTERDATA function with 'ward' and 'cosine' options I obtain a warning and better results than running the function with the Euclidean distance.

Best Answer

In the 'ward' linkage, the distance between two clusters is defined as a weighted version of the 'Euclidean' distance of the centroids of these two clusters. Our documentation page shows the formula used for 'ward' linkage (<http://www.mathworks.com/help/stats/linkage.html>).
It can be shown that, when 'Euclidean' distance is used; 'ward' linkage method forms a new cluster by merging the two clusters that lead to the smallest possible increase of the total sum of the squares of the observation-to-centroid distances.
When the distance is set to 'cosine' in 'ward' linkage option, the 'cosine' distance will replace the 'Euclidean' distance in the formula shown in our documentation. However, using 'cosine' distance does not have a straightforward interpretation.
Related Question