Solved – Does $r$-squared have a $p$-value

p-valuer-squaredstatistical significance

I seem to have confused myself trying to understand if a $r$-squared value also has a $p$-value.

As I understand it, in linear correlation with a set of data points $r$ can have a value ranging from $-1$ to $1$ and this value, whatever it is, can have a $p$-value which shows if $r$ is significantly different from $0$ (i.e., if there is a linear correlation between the two variables).

Moving on to linear regression, a function can be fitted to the data, described by the equation $Y = a + bX$. $a$ and $b$ (intercept and slope) also have $p$-values to show if they are significantly different from $0$.

Assuming I so far have understood everything correct, are the $p$-value for $r$ and the $p$-value for $b$ just the same thing? Is it then correct to say that it is not $r$-squared that has a $p$-value but rather $r$ or $b$ that does?

Best Answer

In addition to the numerous (correct) comments by other users pointing out that the $p$-value for $r^2$ is identical to the $p$-value for the global $F$ test, note that you can also get the $p$-value associated with $r^2$ "directly" using the fact that $r^2$ under the null hypothesis is distributed as $\textrm{Beta}(\frac{v_n}{2},\frac{v_d}{2})$, where $v_n$ and $v_d$ are the numerator and denominator degrees of freedom, respectively, for the associated $F$-statistic.

The 3rd bullet point in the Derived from other distributions subsection of the Wikipedia entry on the beta distribution tells us that:

If $X \sim \chi^2(\alpha)$ and $Y \sim \chi^2(\beta)$ are independent, then $\frac{X}{X+Y} \sim \textrm{Beta}(\frac{\alpha}{2}, \frac{\beta}{2})$.

Well, we can write $r^2$ in that $\frac{X}{X+Y}$ form.

Let $SS_Y$ be the total sum of squares for a variable $Y$, $SS_E$ be the sum of squared errors for a regression of $Y$ on some other variables, and $SS_R$ be the "sum of squares reduced," that is, $SS_R=SS_Y-SS_E$. Then $$ r^2=1-\frac{SS_E}{SS_Y}=\frac{SS_Y-SS_E}{SS_Y}=\frac{SS_R}{SS_R+SS_E} $$ And of course, being sums of squares, $SS_R$ and $SS_E$ are both distributed as $\chi^2$ with $v_n$ and $v_d$ degrees of freedom, respectively. Therefore, $$ r^2 \sim \textrm{Beta}(\frac{v_n}{2},\frac{v_d}{2}) $$ (Of course, I didn't show that the two chi-squares are independent. Maybe a commentator can say something about that.)

Demonstration in R (borrowing code from @gung):

set.seed(111)
x = runif(20)
y = 5 + rnorm(20)
cor.test(x,y)

# Pearson's product-moment correlation
# 
# data:  x and y
# t = 1.151, df = 18, p-value = 0.2648
# alternative hypothesis: true correlation is not equal to 0
# 95 percent confidence interval:
#  -0.2043606  0.6312210
# sample estimates:
#       cor 
# 0.2618393 

summary(lm(y~x))

# Call:
#   lm(formula = y ~ x)
# 
# Residuals:
#     Min      1Q  Median      3Q     Max 
# -1.6399 -0.6246  0.1968  0.5168  2.0355 
# 
# Coefficients:
#             Estimate Std. Error t value Pr(>|t|)    
# (Intercept)   4.6077     0.4534  10.163 6.96e-09 ***
# x             1.1121     0.9662   1.151    0.265    
# ---
#   Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
# 
# Residual standard error: 1.061 on 18 degrees of freedom
# Multiple R-squared:  0.06856,  Adjusted R-squared:  0.01681 
# F-statistic: 1.325 on 1 and 18 DF,  p-value: 0.2648

1 - pbeta(0.06856, 1/2, 18/2)

# [1] 0.2647731

Best Answer

Related Solutions

Regression – How to Interpret Higher R-squared Value in Logistic Regression with Fewer Variables

Solved – How should I interpret the generalized squared multiple correlation

Related Question