Solved – Residual Sum of squares in Weighted regression

matrixmatrix inverseregressionweighted-regression

The Residual Sum of squares (RSS) in Weighted regression is written as
$$(\mathbf{y-X\hat{\boldsymbol\beta}})^{'}\mathbf{C}^{-1}(\mathbf{y-X\hat{\boldsymbol\beta}})$$
Where $$\hat{\boldsymbol\beta}=(\mathbf{X^{'}C^{-1}X})^{-1}\mathbf{X^{'}C^{-1}y}$$

I am trying to write the RSS in an efficient manner which reduces computational complexity, for example I am able to write the RSS as follows
$$(\mathbf{y-X\hat{\boldsymbol\beta}})^{'}\mathbf{C}^{-1}(\mathbf{y-X\hat{\boldsymbol\beta}})=tr(\mathbf{e^{'}C^{-1}e})=tr(\mathbf{e^{'}eC^{-1}})=tr(\mathbf{EC^{-1}})$$ where $tr=trace, \mathbf{e}=(\mathbf{y-X\hat{\boldsymbol\beta}})$ and $E=\mathbf{e^{'}e}$
Although this expression seems mathematically simple however the computational complexity is the same

I hope anyone can help me find an efficient way to write and code the RSS in weighted regression. Also, I would appreciate a reference to an $R$ function that can find this RSS so that I can take a look how the expression is written.

PS : $C$ need not to be a diagonal matrix, however it is symmetric positive semi-definite e.g. a covariance matrix

Best Answer

There are several good ways to do this using R. One classical method is to compute the Choleski factor of the covariance matrix:

R <- chol(C)
yc <- backsolve(R, y, transpose=TRUE)
Xc <- backsolve(R, X, transpose=TRUE)
fit <- lm.fit(Xc, yc)
RSS <- sum(fit$effects[-(1:fit$rank)]^2)

This code requires C to be a strictly positive definite matrix. C has to be positive definite anyway in order to guarantee that the RSS is finite.

If you want to make the computation even more explicit, you could replace the last two lines with this:

QR <- qr(Xc)
e <- qr.qty(QR, yc)
e <- e[-(1:QR$rank)]
RSS <- sum(e^2)

The above code is performing the following mathematical steps. First, we factorize $$C = R^TR$$ where $R$ is an upper triangular matrix. Then we solve the linear systems $$R^Ty_c=y$$ and $$R^TX_c=X$$ for $y_c$ and $X_c$ using an efficient forward substitution algorithm. Note that the above two steps are far more efficient than inverting $C$.

From this point we can view this as an unweighted regression problem with $y_c$ and $X_c$. Amongst other things, the lm.fit function uses the QR decomposition of $X_c$ to find an $n\times(n-p)$ matrix $Q$ such that $Q^TQ=I$ and $Q^TX=0$. Here, $p$ is the column rank of $X$. The orthogonal residuals (or effects) can then be computed as $$e=Q^Ty$$ and finally the RSS is $e^Te$. Actually the function computed $Q^Ty$, where $Q$ was $n\times n$, and stored this vector in fit$effects. We then threw away the first $p$ values to get $e$.

You might have been hoping for a simpler mathematical formula, but efficient computation requires that one avoids evaluating mathematical entities such as inverse matrices or ordinary residuals.

Best Answer

Related Solutions

Solved – Under exactly what conditions is ridge regression able to provide an improvement over ordinary least squares regression

Variance of Ridge Estimator

Comment

Why is ridge regression usually recommended only in the case of correlated predictors?

Solved – Inverse of block covariance matrix

Related Question