Solved – How does R compute r.squared for weighted least squares

generalized-least-squareslinear modelweighted-regression

I used R for fitting a linear model using weighted least squares (due to heteroskedastic errors):
$\boldsymbol{y} = \boldsymbol{X}\boldsymbol{\beta} + \boldsymbol{\varepsilon}$, where $E(\boldsymbol{\varepsilon}) = 0,\ Cov(\boldsymbol{\varepsilon}) = \sigma^{2}\boldsymbol{V}$ and $\boldsymbol{V}$ a known, diagonal matrix.

set.seed(1)
x = c(-1, 0, 1, 2, 3)
y_true = 2 * x^2 + 3*x + 1
vars = c(0.5, 1, 1.5, 2, 2.5)
e = rnorm(vars, mean=0, sd=sqrt(vars))
y = y_true + e

weights = 1./vars
m = lm(y ~ x + I(x^2), weights=weights)

After fitting, R provides the r.squared value:

summary = summary(m)

# The R^2 value that lm provides
r2 = summary$r.squared  # 0.993019

How exactly is this value computed?

I know that for linear models using ordinary (unweighted) least squares, this value is computed as
$R^{2} = 1 – \frac{(\boldsymbol{y} – \boldsymbol{X}\hat{\boldsymbol{\beta}})^T(\boldsymbol{y} – \boldsymbol{X}\hat{\boldsymbol{\beta}})}{\boldsymbol{y}^{T}\boldsymbol{y} – n (\bar{\boldsymbol{y}})^{2}}$, where $\hat{\boldsymbol{\beta}} = (\boldsymbol{X}^{T}\boldsymbol{X})^{-1}\boldsymbol{X}^{T} \boldsymbol{y}$ is the OLS estimator and $\bar{\boldsymbol{y}}$ is the sample mean.

The heteroskedastic model can be transformed to the OLS case by multiplication of both sides with $\boldsymbol{V}^{-1/2}$ yielding

$\tilde{\boldsymbol{y}} = \boldsymbol{V}^{-1/2}\boldsymbol{y} = \boldsymbol{V}^{-1/2}\boldsymbol{X}\boldsymbol{\beta} + \boldsymbol{V}^{-1/2} \boldsymbol{\varepsilon} = \tilde{\boldsymbol{X}}\boldsymbol{\beta} + \tilde{\boldsymbol{\varepsilon}}$.

So, as also described by Willet and Singer (see https://gseacademic.harvard.edu/~willetjo/pdf%20files/Willett_Singer_AS88.pdf),
I see two obvious options for computing $R^{2}$.

Computing it for the transformed model
$R^{2} = 1 – \frac{(\tilde{\boldsymbol{y}} – \tilde{\boldsymbol{X}}\hat{\boldsymbol{\beta}}_{*})^T(\tilde{\boldsymbol{y}} – \tilde{\boldsymbol{X}}\hat{\boldsymbol{\beta}}_{*})}{\tilde{\boldsymbol{y}}^{T}\tilde{\boldsymbol{y}} – n (\bar{\tilde{\boldsymbol{y}}})^{2}}$,
where $\hat{\boldsymbol{\beta}}_{*} = (\tilde{\boldsymbol{X}}^{T}\tilde{\boldsymbol{X}})^{-1}\tilde{\boldsymbol{X}}^{T}\tilde{\boldsymbol{y}} = (\boldsymbol{X}^{T}\boldsymbol{V}^{-1}\boldsymbol{X})^{-1}\boldsymbol{X}^{T}\boldsymbol{V}^{-1} \boldsymbol{y}$ is the OLS estimator of the transformed model.

X = model.matrix(m)
V = diag(vars)
V_neg_half = sqrt(solve(V))  # V^(-1/2)
y_tilde = V_neg_half %*% y
X_tilde = V_neg_half %*% X
n = length(y)

# OLS of the transformed data
X_tilde_t_X = t(X_tilde) %*% X_tilde

beta_hat_star = solve(X_tilde_t_X) %*% t(X_tilde) %*% y_tilde

residuals = y_tilde - X_tilde %*% beta_hat_star
numerator = t(residuals) %*% residuals
denominator = t(y_tilde) %*% y_tilde - n * mean(y_tilde)**2
r2.1 = 1 - numerator / denominator  # 0.989646

Using the WLS estimator with the untransformed data as $R^{2} = 1 – \frac{(\boldsymbol{y} – \boldsymbol{X}\hat{\boldsymbol{\beta}}_{*})^T(\boldsymbol{y} – \boldsymbol{X}\hat{\boldsymbol{\beta}}_{*})}{\boldsymbol{y}^{T}\boldsymbol{y} – n (\bar{\boldsymbol{y}})^{2}}$

residuals = y - X %*% beta_hat_star
numerator = t(residuals) %*% residuals
denominator = t(y) %*% y - n * mean(y)**2
r2.2 = 1 - numerator / denominator  # 0.9923849

However, neither of those values correspond to the result that R provides and I am not sure if the differences of my (artificial) example are only due to rounding issues.

Best Answer

Thanks to a helpful reply from the R-help mailinglist, R seems to compute the r.squared in the way stated here: https://stats.stackexchange.com/a/375752/159251. Computation of the numerator (SSE) is the same as in option 1. Putting the code of R into formula, computation of the denominator (SST) is done through $$\text{SST} = \sum\limits_{i=1}^{n}{w_{i}\left(y_{i} - \text{weighted.mean}(\boldsymbol{y}, \boldsymbol{w})\right)^{2}} = \sum\limits_{i=1}^{n}{w_{i}y_{i}^{2}} - 2 \cdot \text{weighted.mean}(\boldsymbol{y}, \boldsymbol{w}) \cdot\sum\limits_{i=1}^{n}{w_{i}y_{i}\cdot } + \text{weighted.mean} (\boldsymbol{y}, \boldsymbol{w})^{2} \cdot \sum\limits_{i=1}^{n}w_{i}.$$

Now $\text{weighted.mean}(\boldsymbol{y}, \boldsymbol{w}) = \frac{1}{\sum\limits_{k=1}^{n}{w_{k}}}\sum\limits_{j=1}^{n}{w_{j}y_{j}}$ and the computation simplifies to

$$\text{SST} = \sum\limits_{i=1}^{n}{w_{i}y_{i}^{2}} - \frac{1}{\sum\limits_{k=1}^{n}w_{k}} \cdot \left(\sum\limits_{i=1}^{n}{w_{i}y_{i}} \right)^{2}.$$

As a comparison, option 1 from above computes the denominator as $$\text{SST} = \tilde{\boldsymbol{y}}^{T}\tilde{\boldsymbol{y}} - n (\bar{\tilde{\boldsymbol{y}}})^{2} = \sum\limits_{i=1}^{n}{w_{i}y_{i}^{2}} - n \cdot \text{mean}(V^{-1/2}\boldsymbol{y})^{2} = \sum\limits_{i=1}^{n}{w_{i}y_{i}^{2}} - n \left( \frac{1}{n} \sum\limits_{j=1}^{n}\sqrt{w_{j}}y_{j} \right)^{2} = \sum\limits_{i=1}^{n}{w_{i}y_{i}^{2}} - \frac{1}{n} \left( \sum\limits_{j=1}^{n}\sqrt{w_{j}}y_{j} \right)^{2}.$$

So the differences are that weight normalisation is used and the second term has squared weights. The theoretical justification of that is however, not clear to me.

Best Answer

Related Solutions

Solved – Showing that the OLS estimator is scale equivariant

Connection with least squares

Solved – Show that $\hat{\beta}_0 = \bar{y}$ for OLS when the columns of $\mathbf{X}$ are centered

Related Question