Redundancy Analysis – Understanding Principal Components in RDA Using the Vegan Package in R

multivariate analysisrvegan

I ran an RDA through the vegan package in R using water quality (dependent variable) and land use (independent variable) data.

In my test I got the following results:

enter image description here

The total inertia was 9, and the model explained approximately 41.66% (Constrained = 0.4166 = ) of the variance between the quality and land use data.

The model still provides the main axes (RDA1, RDA2, RDA3, RDA4) and 6 more Principal Componentes (PC1 to PC6) as shown in the figure.

I would like to know what the meaning of these components? how to interpret them? I understand that the components absorbed the unexplained variance (Unconstrained =0.5834)

Best Answer

I understand that you ask about PCs of unconstrained axes, and I answer only to this question. If I misunderstood your question, you may stop reading.

Usually people just ignore unconstrained PCs. There have been some ideas how to interpet them, but these often work poorly and are difficult to perform. However, here some ideas.

In partial RDA, you first remove (partial out) effects of some variables, and after that you apply your constraints. That is, you perform ordinary RDA with data where you have removed the (linear) effects of some variables. This after-conditions (after partialling out) RDA is the same as trying to interpret the residual axes (PCs) after using these variables in RDA. What you can do is to try to see if there is any structure in the data that is left unexplained in your model. In principle, the residuals should be random, that is, there is no structure. However, this is often difficult to check as the residual data are noisy, and people just ignore those axes. For instance, in your example, the first PC1 has eigenvalue that is almost as large as the first constrained RDA1 axis, and much higher than the second constrained RDA2. This suggests that there is some structure in the data that you cannot explain with your environmental variables. This is a different thing than the total unexplained variation which should be high in ecological data (and no reason to worry about this), but it is unstructured.

The residual axes (PCs) are based on correlated or similar responses of species residuals after your RDA. If the residuals are uncorrelated, you still can have residual axes, but they are more or less random noise. The practical problem is that even with correlated responses, residuals have a lot of noise and they are difficult to analyse.

Normally we assume that the correlated residual responses are caused by some (or one) of the following reasons:

  1. Unknown environmental variables that were not included in your explanatory model. For this you should try to find such variables or else you are doomed to speculation and this is not often well regarded when you submit your ms.
  2. Positive or negative species associations. In principle the unknown environmental variables also manifest as species associations, but ecologists often want to think that these associations are due to ecological interactions. If this is so, nobody knows. (Bayesian species distribution models use residual latent variables like that: they try to find correlated structures in responses and interpret these by some means, often as biological interactions, but they could well be unknown explanatory variables.)
  3. Wrong explanatory model. Missing explanatory variables can also be included here, but also things like wrong scaling of environmental variables (e.g., arithmetic instead of log scale) or wrong functional model (like using linear responses, like in RDA, instead of curved unimodal, such as polynomial functions), or wrong error model. You cannot disregard this. However, people want to think that if there is a residual structure, it is caused either by (1) or (2) and want to ignore this option (3).

For point (1) all you can do is to find credible environmental variables or other interpretation.

Point (2) is trickier. Usually if you look at full residuals without their PCA decomposition, the responses are more or less uncorrelated, but correlated species responses appear on first PCs with highest eigenvalues. Just for fun, I have written a function to inspect (linear) species associations in an extreme-vegan package called natto:

devtools::install_github("jarioksa/natto")
library(natto) # resassoc for residual species associations
library(vegan) # rda & data
library(corrplot) # correlation plot of residual associations
data(dune, dune.env)
mod <- rda(dune ~ A1 + Management + Moisture, dune.env)
## shows species associations in first three PCs
corrplot(resassoc(mod, 3), order = "AOE")

Point (3) is trickiest of all, except that we know that RDA implies a linear response of species to environmental variables, and we know that this wrong. What we do not know is how much of this wrong model later appears as correlated species responses that seem to call for points (1) or (2). Package vegan has some tools to inspect residuals (ordiresids) but the results are not easily interpreted. Certainly you can assume that your model is wrong, and this speaks either for improving your model or ignoring the inspection of residual PCs. However, the PCs are there in case you want to study them.

Related Question