Regression – Interpreting High $R^2$ and High $p$-Value in Simple Linear Regression

linear modelp-valuer-squaredregression

Let's assume that we have simple linear regression:
$\hat{y} = bx + \text{intercept}$.

Is it possible to have a high p-value and high $R^2$ (or low p-value and low $R^2$)? I've been looking for examples of this. When the linear regression has multiple parameters, I saw some examples where p-value for some parameters are low, but overall $R^2$ is low as well, but I was wondering if it's possible for the linear regression of a single parameter.

Best Answer

Yes, it is possible. The $R^2$ and the $t$ statistic (used to compute the p-value) are related exactly by:

$ |t| = \sqrt{\frac{R^2}{(1- R^2)}(n -2)} $

Therefore, you can have a high $R^2$ with a high p-value (a low $|t|$) if you have a small sample.

For instance, take $n = 3$. For this sample size to give you a (two-sided) p-value less then 10% you would need an $R^2$ greater than 85% -- anything less than that would give you "non-significant" p-value.

As a concrete example, the simulation below produces an $R^2$ close to 0.5 with a p-value of $0.516$.

set.seed(10)
n <- 3
x <- rnorm(n, 0, 1)
y <- 1 + x + rnorm(n, 0, 1)
summary(m1 <- lm(y ~ x))

Call:
lm(formula = y ~ x)

Residuals:
       1        2        3 
-0.36552  0.42802 -0.06251 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept)   0.7756     0.4261    1.82    0.320
x             0.5065     0.5333    0.95    0.516

Residual standard error: 0.5663 on 1 degrees of freedom
Multiple R-squared:  0.4743,    Adjusted R-squared:  -0.05148 
F-statistic: 0.9021 on 1 and 1 DF,  p-value: 0.5164

For the opposite case (low p-value with low $R^2$), you can trivially obtain that by setting a regression where $x$ has a low explanatory power and let $n \to \infty$ to get a p-value as small as you want.

Related Question