I wouldn't place much stock in "rules of thumb" such as this. It is dependent upon so many things such as the number of variables, the number of sites, what dissimilarity you use etc. Also note that the vector fitting approach is inherently linear and we have no reason to presume that the relationship between the variable and the NMDS configuration is linear.
The key thing is that it is a small/modest correlation but that it is significant. But you probably want to look at the linearity assumption.
In the vegan package for R we have function ordisurf()
for this. It fits a 2-d surface to an NMDS solution using a GAM via function gam()
in package mgcv. It essentially fits the model
$$g(\mu_i) = \eta_i = \beta_0 + f(nmds_{1i}, nmds_{2i})$$
where $f$ is a 2-d smooth function of the 1st and 2nd axes of the NMDS, $g$ is the link function, $\mu$ the expectation of the response, and $\eta$ is the linear predictor. The error distribution is a member of the exponential family of distributions. The function $f$ can be isotropic in which case we use smooths formed by s()
employing by default thin plate splines. Anisotropic surfaces can be fitted too, where we use smooths formed by te()
; tensor product smooths.
The complexity of the smooth is chosen during fitting using a, by default in the development versions, REML criterion. GCV smoothness selection is also available.
Here is an R example using one of the in-built data sets provided with vegan
require("vegan")
data(dune)
data(dune.env)
## fit NMDS using Bray-Curtis dissimilarity (default)
set.seed(12)
sol <- metaMDS(dune)
## NMDS plot
plot(sol)
## Fit and add the 2d surface
sol.s <- ordisurf(sol ~ A1, data = dune.env, method = "REML",
select = TRUE)
## look at the fitted model
summary(sol.s)
This produces
> summary(sol.s)
Family: gaussian
Link function: identity
Formula:
y ~ s(x1, x2, k = knots[1], bs = bs[1])
<environment: 0x2fb78a0>
Parametric coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 4.8500 0.4105 11.81 9.65e-10 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Approximate significance of smooth terms:
edf Ref.df F p-value
s(x1,x2) 1.591 9 0.863 0.0203 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
R-sq.(adj) = 0.29 Deviance explained = 35%
REML score = 41.587 Scale est. = 3.3706 n = 20
and
In this case a linear vector fit seems reasonable for this variable. Read ?ordisurf
for details on the arguments used, especially what select = TRUE
does.
The percentage explained depends on the order entered.
If you specify a particular order, you can compute this trivially in R (e.g. via the update
and anova
functions, see below), but a different order of entry would yield potentially very different answers.
[One possibility might be to average across all orders or something, but it would get unwieldy and might not be answering a particularly useful question.]
--
As Stat points out, with a single model, if you're after one variable at a time, you can just use 'anova' to produce the incremental sums of squares table. This would follow on from your code:
anova(fit)
Analysis of Variance Table
Response: dv
Df Sum Sq Mean Sq F value Pr(>F)
iv1 1 0.033989 0.033989 0.7762 0.4281
iv2 1 0.022435 0.022435 0.5123 0.5137
iv3 1 0.003048 0.003048 0.0696 0.8050
iv4 1 0.115143 0.115143 2.6294 0.1802
iv5 1 0.000220 0.000220 0.0050 0.9469
Residuals 4 0.175166 0.043791
--
So there we have the incremental variance explained; how do we get the proportion?
Pretty trivially, scale them by 1 divided by their sum. (Replace the 1 with 100 for percentage variance explained.)
Here I've displayed it as an added column to the anova table:
af <- anova(fit)
afss <- af$"Sum Sq"
print(cbind(af,PctExp=afss/sum(afss)*100))
Df Sum Sq Mean Sq F value Pr(>F) PctExp
iv1 1 0.0339887640 0.0339887640 0.77615140 0.4280748 9.71107544
iv2 1 0.0224346357 0.0224346357 0.51230677 0.5137026 6.40989591
iv3 1 0.0030477233 0.0030477233 0.06959637 0.8049589 0.87077807
iv4 1 0.1151432643 0.1151432643 2.62935731 0.1802223 32.89807550
iv5 1 0.0002199726 0.0002199726 0.00502319 0.9468997 0.06284931
Residuals 4 0.1751656402 0.0437914100 NA NA 50.04732577
--
If you decide you want several particular orders of entry, you can do something even more general like this (which also allows you to enter or remove groups of variables at a time if you wish):
m5 = fit
m4 = update(m5, ~ . - iv5)
m3 = update(m4, ~ . - iv4)
m2 = update(m3, ~ . - iv3)
m1 = update(m2, ~ . - iv2)
m0 = update(m1, ~ . - iv1)
anova(m0,m1,m2,m3,m4,m5)
Analysis of Variance Table
Model 1: dv ~ 1
Model 2: dv ~ iv1
Model 3: dv ~ iv1 + iv2
Model 4: dv ~ iv1 + iv2 + iv3
Model 5: dv ~ iv1 + iv2 + iv3 + iv4
Model 6: dv ~ iv1 + iv2 + iv3 + iv4 + iv5
Res.Df RSS Df Sum of Sq F Pr(>F)
1 9 0.35000
2 8 0.31601 1 0.033989 0.7762 0.4281
3 7 0.29358 1 0.022435 0.5123 0.5137
4 6 0.29053 1 0.003048 0.0696 0.8050
5 5 0.17539 1 0.115143 2.6294 0.1802
6 4 0.17517 1 0.000220 0.0050 0.9469
(Such an approach might also be automated, e.g. via loops and the use of get
. You can add and remove variables in multiple orders if needed)
... and then scale to percentages as before.
(NB. The fact that I explain how to do these things should not necessarily be taken as advocacy of everything I explain.)
Best Answer
You've actually stumbled on a pretty interesting idea that gets explained in linear regression classes. The answer to your question ("what is the share....") is that it depends on which model you're using.
Because your variables are highly correlated, they explain much of the same variance in the data you're trying to model. That means that while both variables are useful in predicting y on their own, when you put them both into to the model at the same time they're "explaining the same variance" and so only one of them will seem to be "significant".
As an extreme example, imagine fitting a model $y = \beta_{0}x_{1}+\beta_{1}x_{1}+\epsilon$ (this is unrealistic, since your variables are totally co-linear you would run into a slew of issues but it illustrates the idea) and looking to see which of the two variables were useful in predicting y. Of course in the separate models $y = \beta_{0}x_{1}+\epsilon$ and $y = \beta_{1}x_{1}+\epsilon$ you would find that both $x_{1}$ and $x_{1}$ were useful in predicting y. But in the general model, since they both explain the exact same variance in y, you could set either $\beta_{0}$ or $\beta_{1}$ to 0 and predict y just as well.