Solved – Can’t understand this multiple R-squared value in R lm()

lmrr-squared

> dt = data.table(x = rnorm(100))
> dt[, y := 1+0.2*x + rnorm(100)]
> 
> fit = lm(y~x, data = dt)
> summary(fit)

Call:
lm(formula = y ~ x, data = dt)

Residuals:
     Min       1Q   Median       3Q      Max 
-2.19493 -0.75218 -0.03459  0.64181  2.38214 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)   0.8561     0.1036   8.266 6.86e-13 ***
x             0.3025     0.1056   2.865   0.0051 ** 
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 1.022 on 98 degrees of freedom
Multiple R-squared:  0.07731,   Adjusted R-squared:  0.06789 
F-statistic: 8.211 on 1 and 98 DF,  p-value: 0.005095

> dt[, 1 - sum(fit$residuals^2) / sum(y^2)]
[1] 0.419261

Shouldn't R-squared be 0.419? Or at least close to it, not 0.07?

Best Answer

Your formula

1 - sum(fit$residuals^2) / sum(y^2)

is wrong. It should be

$$ R^2 = 1 - \frac{\sum_i (y_i-\hat{y_i})^2}{\sum_i (y_i-\bar{y})^2} $$

where $\hat y_i$ is prediction and $\bar y$ is mean of $y$. Check Wikipedia entry on $R^2$ to learn more.

Notice that this could be easily generalized since $\bar y$ is basically the same as prediction from intercept-only regression, so the dividend is residuals from your model and divisor is residuals from the null model (the most basic one that we can imagine).

Best Answer

Related Solutions

Solved – F-test differences Stata and R

Solved – Difference between Adjusted R Squared and Predicted R Squared

Related Question