Solved – interpreting NMDS ordinations that show both samples and species

correspondence-analysisdescriptive statisticsinterpretationmultidimensional scalingr

I am using the vegan package in R to plot non-metric multidimensional scaling (NMDS) ordinations. I am using this package because of its compatibility with common ecological distance measures. When you plot the metaMDS() ordination, it plots both the samples (as black dots) and the species (as red dots). My question is: How do you interpret this simultaneous view of species and sample points?

My understanding of NMDS:

you start with a distance matrix of distances between all your points in multi-dimensional space
The algorithm places your points in fewer dimensional (say 2D) space
The algorithm moves your points around in 2D space so that the distances between points in 2D space go in the same order (rank) as the distances between points in multi-D space.

BUT there are 2 possible distance matrices you can make with your rows=samples cols=species data:

distances between samples based on species composition (i.e. distances in species space)
distances between species based on co-occurrence in samples (i.e. distances in sample space)

Is metaMDS() calculating BOTH possible distance matrices automatically?

Is the ordination plot an overlay of two sets of arbitrary axes from separate ordinations?

How do you interpret co-localization of species and samples in the ordination plot?

note: I did not include example data because you can see the plots I'm talking about in the package documentation example.

Best Answer

The NMDS vegan performs is of the common or garden form of NMDS. If metaMDS() is passed the original data, then we can position the species points (shown in the plot) at the weighted average of site scores (sample points in the plot) for the NMDS dimensions retained/drawn. The weights are given by the abundances of the species.

This is one way to think of how species points are positioned in a correspondence analysis biplot (at the weighted average of the site scores, with site scores positioned at the weighted average of the species scores, and a way to solve CA was discovered simply by iterating those two from some initial starting conditions until the scores stopped changing)

You'll notice that if you supply a dissimilarity matrix to metaMDS() will not draw the species points, because it does not have access to the species abundances (to use as weights).

Really, these species points are an afterthought, a way to help interpret the plot. You interpret the sites scores (points) as you would any other NMDS - distances between points approximate the rank order of distances between samples. The species just add a little bit of extra info, but think of the species point as the "optima" of each species in the NMDS space.

Related Solutions

Correspondence Analysis – Interpreting 2D Correspondence Analysis Plots

First, there are different ways to construct so-called biplots in the case of correspondence analysis. In all cases, the basic idea is to find a way to show the best 2D approximation of the "distances" between row cells and column cells. In other words, we seek a hierarchy (we also speak of "ordination") of the relationships between rows and columns of a contingency table.

Very briefly, CA decomposes the chi-square statistic associated with the two-way table into orthogonal factors that maximize the separation between row and column scores (i.e. the frequencies computed from the table of profiles). Here, you see that there is some connection with PCA but the measure of variance (or the metric) retained in CA is the $\chi^2$, which only depends on column profiles (As it tends to give more importance to modalities that have large marginal values, we can also re-weight the initial data, but this is another story).

Here is a more detailed answer. The implementation that is proposed in the corresp() function (in MASS) follows from a view of CA as an SVD decomposition of dummy coded matrices representing the rows and columns (such that $R^tC=N$, with $N$ the total sample). This is in light with canonical correlation analysis. In contrast, the French school of data analysis considers CA as a variant of the PCA, where you seek the directions that maximize the "inertia" in the data cloud. This is done by diagonalizing the inertia matrix computed from the centered and scaled (by marginals frequencies) two-way table, and expressing row and column profiles in this new coordinate system.

If you consider a table with $i=1,\dots,I$ rows, and $j=1,\dots,J$ columns, each row is weighted by its corresponding marginal sum which yields a series of conditional frequencies associated to each row: $f_{j|i}=n_{ij}/n_{i\cdot}$. The marginal column is called the mean profile (for rows). This gives us a vector of coordinates, also called a profile (by row). For the column, we have $f_{i|j}=n_{ij}/n_{\cdot j}$. In both cases, we will consider the $I$ row profiles (associated to their weight $f_{i\cdot}$) as individuals in the column space, and the $J$ column profiles (associated to their weight $f_{\cdot j}$) as individuals in the row space. The metric used to compute the proximity between any two individuals is the $\chi^2$ distance. For instance, between two rows $i$ and $i'$, we have

$$ d^2_{\chi^2}(i,i')=\sum_{j=1}^J\frac{n}{n_{\cdot j}}\left(\frac{n_{ij}}{n_{i\cdot}}-\frac{n_{i'j}}{n_{i'\cdot}} \right)^2 $$

You may also see the link with the $\chi^2$ statistic by noting that it is simply the distance between observed and expected counts, where expected counts (under $H_0$, independence of the two variables) are computed as $n_{i\cdot}\times n_{\cdot j}/n$ for each cell $(i,j)$. If the two variables were to be independent, the row profiles would be all equal, and identical to the corresponding marginal profile. In other words, when there is independence, your contingency table is entirely determined by its margins.

If you realize an PCA on the row profiles (viewed as individuals), replacing the euclidean distance by the $\chi^2$ distance, then you get your CA. The first principal axis is the line that is the closest to all points, and the corresponding eigenvalue is the inertia explained by this dimension. You can do the same with the column profiles. It can be shown that there is a symmetry between the two approaches, and more specifically that the principal components (PC) for the column profiles are associated to the same eigenvalues than the PCs for the row profiles. What is shown on a biplot is the coordinates of the individuals in this new coordinate system, although the individuals are represented in a separate factorial space. Provided each individual/modality is well represented in its factorial space (you can look at the $\cos^2$ of the modality with the 1st principal axis, which is a measure of the correlation/association), you can even interpret the proximity between elements $i$ and $j$ of your contingency table (as can be done by looking at the residuals of your $\chi^2$ test of independence, e.g. chisq.test(tab)$expected-chisq.test(tab)$observed).

The total inertia of your CA (= the sum of eigenvalues) is the $\chi^2$ statistic divided by $n$ (which is Pearson's $\phi^2$).

Actually, there are several packages that may provide you with enhanced CAs compared to the function available in the MASS package: ade4, FactoMineR, anacor, and ca.

The latest is the one that was used for your particular illustration, and a paper was published in the Journal of Statistical Software that explains most of its functionnalities: Correspondence Analysis in R, with Two- and Three-dimensional Graphics: The ca Package.

So, your example on eye/hair colors can be reproduced in many ways:

data(HairEyeColor)
tab <- apply(HairEyeColor, c(1, 2), sum) # aggregate on gender
tab

library(MASS)
plot(corresp(tab, nf=2))
corresp(tab, nf=2)

library(ca)
plot(ca(tab))
summary(ca(tab, nd=2))

library(FactoMineR)
CA(tab)
CA(tab, graph=FALSE)$eig  # == summary(ca(tab))$scree[,"values"]
CA(tab, graph=FALSE)$row$contrib

library(ade4)
scatter(dudi.coa(tab, scannf=FALSE, nf=2))

In all cases, what we read in the resulting biplot is basically (I limit my interpretation to the 1st axis which explained most of the inertia):

the first axis highlights the clear opposition between light and dark hair color, and between blue and brown eyes;
people with blond hair tend to also have blue eyes, and people with black hair tend to have brown eyes.

There is a lot of additional resources on data analysis on the bioinformatics lab from Lyon, in France. This is mostly in French, but I think it would not be too much a problem for you. The following two handouts should be interesting as a first start:

Finally, when you consider a full disjonctive (dummy) coding of $k$ variables, you get the multiple correspondence analysis.

Solved – NMDS: why is the r-squared for a factor variable so low

Let me take a crack. I believe factorfit finds average ordination scores for each level of your factor (and places an arrowhead), and performs a regression (with goodness of fit)... Although it appears that you have separation due to your factor in nMDS space, the R^2 is likely low simply due to high variance in ordination score around an expected regression score. This could simply be due to other variables that you include in your nMDS ordination, and if you only included [species] important to the distinction between habitat, your R^2 would of course go up. Remember that the axes in nMDS are not linear combinations of descriptors... Thus points are arranged by similarity in k-dimensions, and not according to scores of any specific variable.