Solved – In multidimensional scaling, how can one determine dimensionality of a solution given a stress value

multidimensional scaling

In multidimensional scaling, how can one determine dimensionality of a solution given a stress value? From what I understand, stress value is inversely related to the number of dimensions of a MDS solution, and that higher stress value indicates that there is a lot of error (i.e. badness-of-fit) in the current model, indicating a solution with more dimensions. Are the randomly generated coordinates, number of variables, and number of categories in a variable related?

Best Answer

In multidimensional scaling, how can one determine dimensionality of a solution given a stress value?

Having a stress value it is not possible to determine the dimensionality of the dataset. At best, you can evaluate whether the value is low or high (this evaluation is also a bit problematic to me).

From what I understand, stress value is inversely related to the number of dimensions of a MDS solution,

correct

and that higher stress value indicates that there is a lot of error (i.e. badness-of-fit) in the current model,

correct

indicating a solution with more dimensions.

Not very accurate conclusion. consider stress as a function, "number of dimensions" is one of the inputs of this function. The others [significant factors] are the model that you are using as your MDS model, the initial configuration of points in the MDS configuration(map) or even the order of rows/columns in the dissimilarity matrix. Therefore, you will get different stress values in 2-dimension space for instance just by changing the initial configuration of the points! [although this change in the stress value is not considerable comparing to the one resulted by change in the number of dimensions]

Now if you want to figure out the most proper number of dimensions regarding the stress value, there is a straight-forward solution: In multidimensional scaling, the pragmatic way of depicting the inverse relation of number of dimensions and stress is computing the stress for 2,3,4...,n-1 dimensions. n is the original number of dimension of the data.

The result of above computations becomes more lucid and comprehensible through "Scree plot of number of dimensions ~ amount of stress". The example below is from Cox and Cox(2001): enter image description here

Now we can decide about the number of dimensions based on the relation. It is a trade-off: more dimensions-->lower stress (more accurate map) and less dimension reduction(more difficult to visualize and interpret).

Besides, the proper number of dimensions are not decided solely based on stress value. Your goal also matters. If you want to have a 2D map, then you choose 2-dimensions and then try to minimize the stress as much as possible.

Nevertheless, if you are implying "how much stress is too much" then we have another story! one way of evaluation of your magnitude of stress is comparing it to the average stress values of different possible configurations of your dataset. (have look at "Multidimensional Scaling in R: SMACOF" written by Patrick Mair).

Are the randomly generated coordinates, number of variables, and number of categories in a variable related?

Sorry but I don't understand this part of your question.