Multiple Regression R – Calculate Variance Explained by Each Predictor in R

rregressionvariance

I have run a multiple regression in which the model as a whole is significant and explains about 13% of the variance. However, I need to find the amount of variance explained by each significant predictor. How can I do this using R?

Here's some sample data and code:

D = data.frame(
    dv = c( 0.75, 1.00, 1.00, 0.75, 0.50, 0.75, 1.00, 1.00, 0.75, 0.50 ),
    iv1 = c( 0.75, 1.00, 1.00, 0.75, 0.75, 1.00, 0.50, 0.50, 0.75, 0.25 ),
    iv2 = c( 0.882, 0.867, 0.900, 0.333, 0.875, 0.500, 0.882, 0.875, 0.778, 0.867 ),
    iv3 = c( 1.000, 0.067, 1.000, 0.933, 0.875, 0.500, 0.588, 0.875, 1.000, 0.467 ),
    iv4 = c( 0.889, 1.000, 0.905, 0.938, 0.833, 0.882, 0.444, 0.588, 0.895, 0.812 ),
    iv5 = c( 18, 16, 21, 16, 18, 17, 18, 17, 19, 16 ) )
fit = lm( dv ~ iv1 + iv2 + iv3 + iv4 + iv5, data=D )
summary( fit )

Here's the output with my actual data:

Call: lm(formula = posttestScore ~ pretestScore + probCategorySame + 
    probDataRelated + practiceAccuracy + practiceNumTrials, data = D)

Residuals:
    Min      1Q  Median      3Q     Max 
-0.6881 -0.1185  0.0516  0.1359  0.3690 

Coefficients:
                  Estimate Std. Error t value Pr(>|t|)
 (Intercept)        0.77364    0.10603    7.30  8.5e-13 ***
 iv1                0.29267    0.03091    9.47  < 2e-16 ***
 iv2                0.06354    0.02456    2.59   0.0099 **
 iv3                0.00553    0.02637    0.21   0.8340
 iv4               -0.02642    0.06505   -0.41   0.6847
 iv5               -0.00941    0.00501   -1.88   0.0607 .  
--- Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.18 on 665 degrees of freedom
 Multiple R-squared:  0.13,      Adjusted R-squared:  0.123
 F-statistic: 19.8 on 5 and 665 DF,  p-value: <2e-16

This question has been answered here, but the accepted answer only addresses uncorrelated predictors, and while there is an additional response that addresses correlated predictors, it only provides a general hint, not a specific solution. I would like to know what to do if my predictors are correlated.

Best Answer

The percentage explained depends on the order entered.

If you specify a particular order, you can compute this trivially in R (e.g. via the update and anova functions, see below), but a different order of entry would yield potentially very different answers.

[One possibility might be to average across all orders or something, but it would get unwieldy and might not be answering a particularly useful question.]

--

As Stat points out, with a single model, if you're after one variable at a time, you can just use 'anova' to produce the incremental sums of squares table. This would follow on from your code:

 anova(fit)
Analysis of Variance Table

Response: dv
          Df   Sum Sq  Mean Sq F value Pr(>F)
iv1        1 0.033989 0.033989  0.7762 0.4281
iv2        1 0.022435 0.022435  0.5123 0.5137
iv3        1 0.003048 0.003048  0.0696 0.8050
iv4        1 0.115143 0.115143  2.6294 0.1802
iv5        1 0.000220 0.000220  0.0050 0.9469
Residuals  4 0.175166 0.043791        

--

So there we have the incremental variance explained; how do we get the proportion?

Pretty trivially, scale them by 1 divided by their sum. (Replace the 1 with 100 for percentage variance explained.)

Here I've displayed it as an added column to the anova table:

 af <- anova(fit)
 afss <- af$"Sum Sq"
 print(cbind(af,PctExp=afss/sum(afss)*100))
          Df       Sum Sq      Mean Sq    F value    Pr(>F)      PctExp
iv1        1 0.0339887640 0.0339887640 0.77615140 0.4280748  9.71107544
iv2        1 0.0224346357 0.0224346357 0.51230677 0.5137026  6.40989591
iv3        1 0.0030477233 0.0030477233 0.06959637 0.8049589  0.87077807
iv4        1 0.1151432643 0.1151432643 2.62935731 0.1802223 32.89807550
iv5        1 0.0002199726 0.0002199726 0.00502319 0.9468997  0.06284931
Residuals  4 0.1751656402 0.0437914100         NA        NA 50.04732577

--

If you decide you want several particular orders of entry, you can do something even more general like this (which also allows you to enter or remove groups of variables at a time if you wish):

 m5 = fit
 m4 = update(m5, ~ . - iv5)
 m3 = update(m4, ~ . - iv4)
 m2 = update(m3, ~ . - iv3)
 m1 = update(m2, ~ . - iv2)
 m0 = update(m1, ~ . - iv1)

 anova(m0,m1,m2,m3,m4,m5)
Analysis of Variance Table

Model 1: dv ~ 1
Model 2: dv ~ iv1
Model 3: dv ~ iv1 + iv2
Model 4: dv ~ iv1 + iv2 + iv3
Model 5: dv ~ iv1 + iv2 + iv3 + iv4
Model 6: dv ~ iv1 + iv2 + iv3 + iv4 + iv5
  Res.Df     RSS Df Sum of Sq      F Pr(>F)
1      9 0.35000                           
2      8 0.31601  1  0.033989 0.7762 0.4281
3      7 0.29358  1  0.022435 0.5123 0.5137
4      6 0.29053  1  0.003048 0.0696 0.8050
5      5 0.17539  1  0.115143 2.6294 0.1802
6      4 0.17517  1  0.000220 0.0050 0.9469

(Such an approach might also be automated, e.g. via loops and the use of get. You can add and remove variables in multiple orders if needed)

... and then scale to percentages as before.

(NB. The fact that I explain how to do these things should not necessarily be taken as advocacy of everything I explain.)

Related Question