Solved – the relationship between R-squared and p-value in a regression

modelingp-valuer-squaredregression

tl;dr – for OLS regression, does a higher R-squared also imply a higher P-value? Specifically for a single explanatory variable (Y = a + bX + e) but would also be interested to know for n multiple explanatory variables (Y = a + b1X + … bnX + e).

Context – I'm performing OLS regression on a range of variables and am trying to develop the best explanatory functional form by producing a table containing the R-squared values between the linear, logarithmic, etc., transformations of each explanatory (independent) variable and the response (dependent) variable. This looks a bit like:

Variable name –linear form– –ln(variable) –exp(variable)– …etc

Variable 1 ——- R-squared —-R-squared —-R-squared —
…etc…

I'm wondering if R-squared is appropriate or if P-values would be better. Presumably there is some relationship, as a more significant relationship would imply higher explanatory power, but not sure if that is true in a rigorous way.

Best Answer

The answer is no, there is no such regular relationship between $R^2$ and the overall regression p-value, because $R^2$ depends as much on the variance of the independent variables as it does on the variance of the residuals (to which it is inversely proportional), and you are free to change the variance of the independent variables by arbitrary amounts.

As an example, consider any set of multivariate data $((x_{i1}, x_{i2}, \ldots, x_{ip}, y_i))$ with $i$ indexing the cases and suppose that the set of values of the first independent variable, $\{x_{i1}\}$, has a unique maximum $x^*$ separated from the second-highest value by a positive amount $\epsilon$. Apply a non-linear transformation of the first variable that sends all values less than $x^* - \epsilon/2$ to the range $[0,1]$ and sends $x^*$ itself to some large value $M \gg 1$. For any such $M$ this can be done by a suitable (scaled) Box-Cox transformation $x \to a((x-x_0)^\lambda - 1)/(\lambda-1))$, for instance, so we're not talking about anything strange or "pathological." Then, as $M$ grows arbitrarily large, $R^2$ approaches $1$ as closely as you please, regardless of how bad the fit is, because the variance of the residuals will be bounded while the variance of the first independent variable is asymptotically proportional to $M^2$.


You should instead be using goodness of fit tests (among other techniques) to select an appropriate model in your exploration: you ought to be concerned about the linearity of the fit and of the homoscedasticity of the residuals. And don't take any p-values from the resulting regression on trust: they will end up being almost meaningless after you have gone through this exercise, because their interpretation assumes the choice of expressing the independent variables did not depend on the values of the dependent variable at all, which is very much not the case here.