Solved – Dendrograms using clustergram vs traditional ways in Matlab

dendrogramMATLAB

By comparing the dendrograms produced by the clustergram object and the "manual" approach i.e. pdist -> linkage -> dendrogram I found they are different, but cannot find an explanation for that difference.

Clustergram documentation says that the default distance used is 'Euclidean' and the default linkage method is 'Average', same parameters I used for pdist and linkage functions.

I thought it might be related to the standardization performed by clustergram, so I used z-score for standardizing first my matrix and then each column separately, but the dendrograms are still different as compared with the one produced by clustergram.

Can somebody explain the reason behind that difference, and how can I use the "manual" approach for getting the same dendrogram?

Many thanks in advance.

Edit: I compared the results obtained by the "manual" approach with the SPSS results and they are the same.
Also, I tested all different ' data processing' options with clustergram but the dendrogram keeps being different.

Best Answer

pdist -> linkage -> dendrogram is also used in clustergram, just open the clustergram code:

dist = pdist(data, pdistArgs{:});
Z = linkage(dist, linkageArgs);

...

[lineH, T, Perm] = dendrogram(Z,0,dendroArgs{:},'Orientation', dendroLoc);

So the only difference can be in data preprocessing or parameters for pdist/linkage/dendrogram. I suggest you go over the clustergram parameters one by one. You can also look through the code although it is pretty long.