NMDS – How to Deal with Stress Over 0.2 in NMDS for Large Datasets

ecologymultidimensional scalingr

I am analysing a large dataset (2000 rows by 250 columns) of the presence of species in several locations over the last 20 years. I have conducted a NMDS in order to identify differences between the main two type of forest.
The function ends up converging but giving a stress of 0.25. Pretty much everywhere I look says that more than the commonly accepted limit of 0.2 is a bad representation.
I've seen that when dealing with such massive databases the stress limit of 0,2 might not be a good way to measure goodness of fit either. So my questions are:

  • Are my result still usable? Adonis gives a p-value of less than 0.05 and in general, everything seems to point out that the two sites are different in species composition.
  • Is there another way to measure the fit of the model?
  • Any other alternatives?

The code:

metaMDS(bird.matrix, distance = "bray", k = 3, maxit = 999,   
    trymax = 10000, wascores = TRUE,noshare = 0.1, 
    previous.best = nmds)

Best Answer

IIUC, you have two types of forests, and each of the 2000 rows in your dataset belongs to one of those types.

In principle, it is appropriate to use adonis() for testing whether the two types of forest are different, and with a p-value of less than 0.05, you can be quite confident that they indeed differ. However, the problem is that you have quite a lot of data (2000 rows), so that makes it easy for adonis() to find some differences, even if they are not very relevant in your eyes. The two types of forest are probably not one hundred percent identical and giving lots of data to the test, it will always find that they differ. That is why, in general, one should be cautious with applying significance tests to large datasets.

As far as NMDS is concerned, this is more of a visualization tool, so you can get some feeling about the relative positioning. This is often quite helpful to build intuition, e.g. to see how well the data of the two types of forests are separated, but it doesn't give you "concrete evidence". Note, that it is a map from your original space of 250 dimensions to the two-dimensional space. This is inevitably losing lots of information and it is difficult/impossible to figure out, what exactly is lost.

Your three questions:

  • Using adonis() is, in principle, correct, but, as explained above, it might be "too good" to be of use. And it is totally fine to use NMDS as visualization help.
  • Stress is the standard measure for NMDS. And even if you found some other measure, there is no reason why this other measure should be better than the stress.
  • As far as alternatives are concerned, the above problem is present with all significance tests. In your case, you would have to properly describe how much two types of forest can differ for them to still be considered "sufficiently equal". This is probably difficult. One approach might be to find several different types of forests, some of which you know are definitely different, and then compare the difference between those with the difference of the two you are interested in. And there are also other visualization tools, MDS, t-SNE, or UMAP, which however have some requirements on the data, and given that you are using NMDS, I guess those are not satisfied.