Solved – Normalizing data before applying MDS with strain criterion

The features of my dataset are like below:
• BI-RADS assessment: 1 to 5 (ordinal)
• Age: patient's age in years (integer) ranges from 18 to 96
• Shape: mass shape: round=1 oval=2 lobular=3 irregular=4 (nominal)
• Margin: mass margin: circumscribed=1 microlobulated=2 obscured=3 ill-defined=4 spiculated=5 (nominal)
• Density: mass density high=1 iso=2 low=3 fat-containing=4 (ordinal)

When I run MDS with "strain" criterion on such a dataset without normalizing it first, I get a result as follows:
enter image description here
However, if I normalize the data the result is as follows:

The second results is pretty similar to results that I have got for other criteria and also for the PCA even I didn't normalize the data for them also.

So, my question is: Why does normalizing data make difference for "strain" criterion?

Thanks in advance…

user1 user2 user3 user4 user5 user6 user7 user8 user9 1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 2 18 96 21 45 93 19 21 19 90 3 0.08 0.12 0.19 0.70 0.12 0.01 0.00 0.09 0.04 4 1019 217 53 1082 1010 2 30 0 100

Best Answer

Suppose you don't normalize your data. You could have a situation like this:

So it is clear that some features influence the results much more than others.

Remember that the MDS is a way to force differences between elements in n dimensions in differences in 2 dimensions, between all the couples of elements! I simply think that in the first case the strain function

is affected by the lack of normalization. When you normalize the data the algorithm is able to catch the real differences among the points considering in a proper way all the different features.

Best Answer

Related Solutions

Solved – Should data be centered+scaled before applying t-SNE

Related Question