Solved – Help me understand nMDS algorithm

multidimensional scalingnonparametric

I have been reading Zuur, Ieno and Smith (2007) Analyzing ecological data, and on page 262, they try to explain how nMDS (non-metric multidimensional scaling) algorithm works. As my background is in biology and not math or statistics per se, I'm having hard time understanding a few points and would ask you if you could elaborate on them. I'm reproducing the entire algorithm list for clarity, and I hope I'm not breaking any laws by doing so.

  1. Choose a measure of association and calculate the distance matrix D.
  2. Specify m, the number of axes.
  3. Construct a starting configuration E. This can be done using PCoA.
  4. Regress the configuration on D: D_ij = (alpha) + (beta)E_ij + (epsilon)_ij.
  5. Measure the relationship between the m dimensional configuration and the real dinstances by fitting a non-parametric (monotonic) regression curve in the Shepard diagram. A monotonic regression is constrained to increase. If a parametric regression line is used, we obtain PCoA.
  6. The discrepancy from the fitted curve is called STRESS.
  7. Using non-linear optimization routines, obtain a new estimation of E and go from step 4 until convergence.

Questions:
In 4., we regress the configuration to D. Where do we use the estimated parameters (alpha), (beta) and (epsilon)? Are these used to measure distance from the regression (Shepard diagram) in this new configuration

In regard to number 7, can you talk a little about non-linear optimisation routines? My internet search came up pretty much empty in terms of a layman's explanation. I'm interested in knowing what this routine tries to achieve (in nMDS). And I guess the next question depends on knowing these routines: what represents convergence? What converges to where?

Can someone add "nmds" tag? I can't create new tags yet…

Best Answer

Few opening remarks. In nMDS you have a matrix of dissimilarities $D_{ij}$ (not distances; for instance this can be a per cent of people that said in some poll that i&j are not similar). What you want to obtain is a set of points ($E=[X_i]$) representing objects on M-dim space; having it, you have the matrix of distances between objects in this space $d_{ij}$.
nMDS tries to guess such $E$ that $d_{ij}$ has the same rank as $D_{ij}$; it is like connecting each object pair with spring the more strong the less dissimilar the pair is and then releasing the whole configuration -- after relaxation, the objects that has been connected using stronger springs will be nearer.
Point 4 is something like overfitting regression. You have some approximation of objects position $E^a$, and so also approximated distances $d^a_{ij}$. Now you can do regression $d^a_{ij}$~$f(D_{ij})$ and using it count the distances that should be if the $D$ would be represented perfectly $d^r_{ij}=f(D_{ij})$.
Still, because you cannot directly count $E$ from $d^r$ (this is the problem of nonlinear optimization here), you must somehow mutate $E$ so that the distances will approach $d^r$. The standard method here is to mimic physical analogy with springs and move objects which are connected with most extended springs (having largest $|d^a_{ij}-d^r_{ij}|$) towards themselves so that the potential energy (this STRESS) of the system will be minimized mostly.

Related Question