Solved – way to calculate R-squared in OLS without computing the coefficients

least squaresr-squaredregressionwhite-test

The background of my question is that for e.g. the White heteroskedasticity test or the Breusch-Godfrey (LM) autocorrelation test, we are generally only interested in the R-squared of the "auxiliary" regression. However, the only way of computing said R-squared that I am aware of involves deriving the coefficients etc. This can potentially consume a lot of time due to a large number of regressors and thus a large dimension of the matrix that needs to be inverted (in the case of the White test, regressing the squared residuals on the independent variables, their squares and cross-products – the number of regressors is thus a quadratic function of the number of independent variables).

Is there an "alternative way" to calculate (or perhaps approximate) R-squared?

(I know that the problem could be avoided by using different tests – e.g. Breusch-Pagan instead of White for heteroskedasticity, Durbin-Watson instead of Breusch-Godfrey. However, I am interested in this question both for the fun of it and because the mentioned alternative tests can be inferior to the ones mentioned at the beginning.)

Best Answer

No, given a multiple regression, there is no way to compute R-squared while avoiding the bulk of the other computations. You can certainly avoid computing the coefficients themselves, but the main work of the computation still needs to be done.

Note however that no matrix is ever inverted during a linear regression if the computation is done properly. There are many answers on this site that explain that, for example Residual Sum of squares in Weighted regression

Here is what might be the minimum possible computation to get R-square. You have to somehow orthogonalize $y$ for the regression covariates, and the QR decomposition is the most-used method of doing that. Let's assume we have a $y$ vector of 10 observations:

    > y <- rnorm(10)

and an $X$ matrix with 2 predictors:

    > x1 <- rnorm(10)
    > x2 <- rnorm(10)

The quickest way to get R-square would be like this. First mean correct:

    > y.c <- y-mean(y)
    > x1.c <- x1-mean(x1)
    > x2.c <- x2-mean(x2)

Then compute a QR matrix decomposition for $X$ and $y$ together:

    > QR <- qr( cbind(x1.c, x2.c, y.c) )

Then R-squared is one minus the proportion of the sum of squares that still remains:

    > Rsquared <- 1 - QR$qr[3,3]^2 / sum(y.c^2)
    > Rsquared
          y.c
    0.3266491

We can confirm that this is correct:

    > fit <- lm(y ~ x1+x2)
    > summary(fit)
    
    Call:
    lm(formula = y ~ x1 + x2)
    
    Residuals:
         Min       1Q   Median       3Q      Max 
    -2.44213 -0.47947  0.08121  0.89085  1.54395 
    
    Coefficients:
                Estimate Std. Error t value Pr(>|t|)
    (Intercept)   0.5032     0.4761   1.057    0.326
    x1            0.5330     0.4118   1.294    0.237
    x2           -0.6153     0.4215  -1.460    0.188
    
    Residual standard error: 1.323 on 7 degrees of freedom
    Multiple R-squared:  0.3266,    Adjusted R-squared:  0.1343 
    F-statistic: 1.698 on 2 and 7 DF,  p-value: 0.2505

By the way, if you don't want to bother de-meaning the x-variables, then you can compute the QR decomposition using the entire design matrix including the intercept:

    > QR <- qr( cbind(1, x1, x2, y) )
    > Rsquared <- 1 - QR$qr[4,4]^2 / sum(y.c^2)

This gives the same result because the QR decomposition orthogonalizes each column in succession with respect to the previous columns and de-meaning simply orthogonalizes all the columns with respect to the constant vector.

Related Question