Solved – Stress values of an nMDS analysis and p-values of ordisurf()

I have run a nMDS analysis in vegan and I have a few questions on the stress values of the analysis as well as the p-values associated with utilizing the ordisurf() function (also in vegan).

First of all, what is a good stress value? I know that getting closer to zero is ideal. Depending on how many dimensions I add this value changes. For example at k=2 my stress value is around 0.12, and k=3 it goes down to 0.08. I've looked around the web and have found different opinions. Some say 0.05 is a good cutoff while others say anything below 0.15 is considered "great". Any ideas?

Next, I've also started utilizing the ordisurf function to includes surfaces of my environmental data. As I understand it ordisurf essentially runs a general additive model using the environmental data as a response and using the nMDS axes as predictors. There is a p-value associated with this model. However, I am wondering whether or not to take these p-values with a grain of salt. For example if I have a p-value of 0.07 does that automatically throw out the usage of that model/surface?

Best Answer

Regarding stress, I don't think there is a good value to recommend as it depends on things like the number of observations. I'd say either of the solutions you found were good fits, the second somewhat more so. However, with the second you have the added complexity of now trying to display and interpret a 3-d configuration of points. I'd plot the 2-d configuration and pairs of the 3 dimensions of the 3-d configuration and see if there is any qualitative difference that would suggest retaining the 3-d solution.

As with any p-value beyond linear models (GLMs, GAMs, etc), these p-values are approximate, relying on asymptotic behaviour (i.e. as sample sizes get large), more so for GAMs because they have to be computed whilst accounting for the selection of the smoothness parameter(s) which control the wiggliness of the fitted spline/surface. The p-value reported does this correction and is based on the surprising frequentist coverage properties of the Bayesian credible interval for the estimated spline. As a result, you can think of the p-values as being based on a test for a zero function (so a flat surface at zero in the case of ordisurf()), and the p-value as an approximate summary of the test.

If you want to deep dive into this, see the references in ?summary.gam regarding p-values, notably one of Simon Wood's two 2013 papers on p-values (not the one on random effects). Otherwise, just take the comments in ?summary.gam at face value.

Best Answer

Related Solutions

Solved – How to interpret lagsarlm output from R’s spdep

Related Question