The background of my question is that for e.g. the White heteroskedasticity test or the Breusch-Godfrey (LM) autocorrelation test, we are generally only interested in the R-squared of the "auxiliary" regression. However, the only way of computing said R-squared that I am aware of involves deriving the coefficients etc. This can potentially consume a lot of time due to a large number of regressors and thus a large dimension of the matrix that needs to be inverted (in the case of the White test, regressing the squared residuals on the independent variables, their squares and cross-products – the number of regressors is thus a quadratic function of the number of independent variables).
Is there an "alternative way" to calculate (or perhaps approximate) R-squared?
(I know that the problem could be avoided by using different tests – e.g. Breusch-Pagan instead of White for heteroskedasticity, Durbin-Watson instead of Breusch-Godfrey. However, I am interested in this question both for the fun of it and because the mentioned alternative tests can be inferior to the ones mentioned at the beginning.)
Best Answer
No, given a multiple regression, there is no way to compute R-squared while avoiding the bulk of the other computations. You can certainly avoid computing the coefficients themselves, but the main work of the computation still needs to be done.
Note however that no matrix is ever inverted during a linear regression if the computation is done properly. There are many answers on this site that explain that, for example Residual Sum of squares in Weighted regression
Here is what might be the minimum possible computation to get R-square. You have to somehow orthogonalize $y$ for the regression covariates, and the QR decomposition is the most-used method of doing that. Let's assume we have a $y$ vector of 10 observations:
and an $X$ matrix with 2 predictors:
The quickest way to get R-square would be like this. First mean correct:
Then compute a QR matrix decomposition for $X$ and $y$ together:
Then R-squared is one minus the proportion of the sum of squares that still remains:
We can confirm that this is correct:
By the way, if you don't want to bother de-meaning the x-variables, then you can compute the QR decomposition using the entire design matrix including the intercept:
This gives the same result because the QR decomposition orthogonalizes each column in succession with respect to the previous columns and de-meaning simply orthogonalizes all the columns with respect to the constant vector.