I have run a multiple regression in which the model as a whole is significant and explains about 13% of the variance. However, I need to find the amount of variance explained by each significant predictor. How can I do this using R?
Here's some sample data and code:
D = data.frame(
dv = c( 0.75, 1.00, 1.00, 0.75, 0.50, 0.75, 1.00, 1.00, 0.75, 0.50 ),
iv1 = c( 0.75, 1.00, 1.00, 0.75, 0.75, 1.00, 0.50, 0.50, 0.75, 0.25 ),
iv2 = c( 0.882, 0.867, 0.900, 0.333, 0.875, 0.500, 0.882, 0.875, 0.778, 0.867 ),
iv3 = c( 1.000, 0.067, 1.000, 0.933, 0.875, 0.500, 0.588, 0.875, 1.000, 0.467 ),
iv4 = c( 0.889, 1.000, 0.905, 0.938, 0.833, 0.882, 0.444, 0.588, 0.895, 0.812 ),
iv5 = c( 18, 16, 21, 16, 18, 17, 18, 17, 19, 16 ) )
fit = lm( dv ~ iv1 + iv2 + iv3 + iv4 + iv5, data=D )
summary( fit )
Here's the output with my actual data:
Call: lm(formula = posttestScore ~ pretestScore + probCategorySame +
probDataRelated + practiceAccuracy + practiceNumTrials, data = D)
Residuals:
Min 1Q Median 3Q Max
-0.6881 -0.1185 0.0516 0.1359 0.3690
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.77364 0.10603 7.30 8.5e-13 ***
iv1 0.29267 0.03091 9.47 < 2e-16 ***
iv2 0.06354 0.02456 2.59 0.0099 **
iv3 0.00553 0.02637 0.21 0.8340
iv4 -0.02642 0.06505 -0.41 0.6847
iv5 -0.00941 0.00501 -1.88 0.0607 .
--- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.18 on 665 degrees of freedom
Multiple R-squared: 0.13, Adjusted R-squared: 0.123
F-statistic: 19.8 on 5 and 665 DF, p-value: <2e-16
This question has been answered here, but the accepted answer only addresses uncorrelated predictors, and while there is an additional response that addresses correlated predictors, it only provides a general hint, not a specific solution. I would like to know what to do if my predictors are correlated.
Best Answer
The percentage explained depends on the order entered.
If you specify a particular order, you can compute this trivially in R (e.g. via the
update
andanova
functions, see below), but a different order of entry would yield potentially very different answers.[One possibility might be to average across all orders or something, but it would get unwieldy and might not be answering a particularly useful question.]
--
As Stat points out, with a single model, if you're after one variable at a time, you can just use 'anova' to produce the incremental sums of squares table. This would follow on from your code:
--
So there we have the incremental variance explained; how do we get the proportion?
Pretty trivially, scale them by 1 divided by their sum. (Replace the 1 with 100 for percentage variance explained.)
Here I've displayed it as an added column to the anova table:
--
If you decide you want several particular orders of entry, you can do something even more general like this (which also allows you to enter or remove groups of variables at a time if you wish):
(Such an approach might also be automated, e.g. via loops and the use of
get
. You can add and remove variables in multiple orders if needed)... and then scale to percentages as before.
(NB. The fact that I explain how to do these things should not necessarily be taken as advocacy of everything I explain.)