Solved – How does R compute r.squared for weighted least squares

generalized-least-squareslinear modelweighted-regression

I used R for fitting a linear model using weighted least squares (due to heteroskedastic errors):
$\boldsymbol{y} = \boldsymbol{X}\boldsymbol{\beta} + \boldsymbol{\varepsilon}$, where $E(\boldsymbol{\varepsilon}) = 0,\ Cov(\boldsymbol{\varepsilon}) = \sigma^{2}\boldsymbol{V}$ and $\boldsymbol{V}$ a known, diagonal matrix.

set.seed(1)
x = c(-1, 0, 1, 2, 3)
y_true = 2 * x^2 + 3*x + 1
vars = c(0.5, 1, 1.5, 2, 2.5)
e = rnorm(vars, mean=0, sd=sqrt(vars))
y = y_true + e

weights = 1./vars
m = lm(y ~ x + I(x^2), weights=weights)

After fitting, R provides the r.squared value:

summary = summary(m)

# The R^2 value that lm provides
r2 = summary$r.squared  # 0.993019

How exactly is this value computed?

I know that for linear models using ordinary (unweighted) least squares, this value is computed as
$R^{2} = 1 – \frac{(\boldsymbol{y} – \boldsymbol{X}\hat{\boldsymbol{\beta}})^T(\boldsymbol{y} – \boldsymbol{X}\hat{\boldsymbol{\beta}})}{\boldsymbol{y}^{T}\boldsymbol{y} – n (\bar{\boldsymbol{y}})^{2}}$, where $\hat{\boldsymbol{\beta}} = (\boldsymbol{X}^{T}\boldsymbol{X})^{-1}\boldsymbol{X}^{T} \boldsymbol{y}$ is the OLS estimator and $\bar{\boldsymbol{y}}$ is the sample mean.

The heteroskedastic model can be transformed to the OLS case by multiplication of both sides with $\boldsymbol{V}^{-1/2}$ yielding

$\tilde{\boldsymbol{y}} = \boldsymbol{V}^{-1/2}\boldsymbol{y} = \boldsymbol{V}^{-1/2}\boldsymbol{X}\boldsymbol{\beta} + \boldsymbol{V}^{-1/2} \boldsymbol{\varepsilon} = \tilde{\boldsymbol{X}}\boldsymbol{\beta} + \tilde{\boldsymbol{\varepsilon}}$.

So, as also described by Willet and Singer (see https://gseacademic.harvard.edu/~willetjo/pdf%20files/Willett_Singer_AS88.pdf),
I see two obvious options for computing $R^{2}$.

  1. Computing it for the transformed model
    $R^{2} = 1 – \frac{(\tilde{\boldsymbol{y}} – \tilde{\boldsymbol{X}}\hat{\boldsymbol{\beta}}_{*})^T(\tilde{\boldsymbol{y}} – \tilde{\boldsymbol{X}}\hat{\boldsymbol{\beta}}_{*})}{\tilde{\boldsymbol{y}}^{T}\tilde{\boldsymbol{y}} – n (\bar{\tilde{\boldsymbol{y}}})^{2}}$,
    where $\hat{\boldsymbol{\beta}}_{*} = (\tilde{\boldsymbol{X}}^{T}\tilde{\boldsymbol{X}})^{-1}\tilde{\boldsymbol{X}}^{T}\tilde{\boldsymbol{y}} = (\boldsymbol{X}^{T}\boldsymbol{V}^{-1}\boldsymbol{X})^{-1}\boldsymbol{X}^{T}\boldsymbol{V}^{-1} \boldsymbol{y}$ is the OLS estimator of the transformed model.
X = model.matrix(m)
V = diag(vars)
V_neg_half = sqrt(solve(V))  # V^(-1/2)
y_tilde = V_neg_half %*% y
X_tilde = V_neg_half %*% X
n = length(y)

# OLS of the transformed data
X_tilde_t_X = t(X_tilde) %*% X_tilde

beta_hat_star = solve(X_tilde_t_X) %*% t(X_tilde) %*% y_tilde

residuals = y_tilde - X_tilde %*% beta_hat_star
numerator = t(residuals) %*% residuals
denominator = t(y_tilde) %*% y_tilde - n * mean(y_tilde)**2
r2.1 = 1 - numerator / denominator  # 0.989646

  1. Using the WLS estimator with the untransformed data as $R^{2} = 1 – \frac{(\boldsymbol{y} – \boldsymbol{X}\hat{\boldsymbol{\beta}}_{*})^T(\boldsymbol{y} – \boldsymbol{X}\hat{\boldsymbol{\beta}}_{*})}{\boldsymbol{y}^{T}\boldsymbol{y} – n (\bar{\boldsymbol{y}})^{2}}$
residuals = y - X %*% beta_hat_star
numerator = t(residuals) %*% residuals
denominator = t(y) %*% y - n * mean(y)**2
r2.2 = 1 - numerator / denominator  # 0.9923849

However, neither of those values correspond to the result that R provides and I am not sure if the differences of my (artificial) example are only due to rounding issues.

Best Answer

Thanks to a helpful reply from the R-help mailinglist, R seems to compute the r.squared in the way stated here: https://stats.stackexchange.com/a/375752/159251. Computation of the numerator (SSE) is the same as in option 1. Putting the code of R into formula, computation of the denominator (SST) is done through $$\text{SST} = \sum\limits_{i=1}^{n}{w_{i}\left(y_{i} - \text{weighted.mean}(\boldsymbol{y}, \boldsymbol{w})\right)^{2}} = \sum\limits_{i=1}^{n}{w_{i}y_{i}^{2}} - 2 \cdot \text{weighted.mean}(\boldsymbol{y}, \boldsymbol{w}) \cdot\sum\limits_{i=1}^{n}{w_{i}y_{i}\cdot } + \text{weighted.mean} (\boldsymbol{y}, \boldsymbol{w})^{2} \cdot \sum\limits_{i=1}^{n}w_{i}.$$

Now $\text{weighted.mean}(\boldsymbol{y}, \boldsymbol{w}) = \frac{1}{\sum\limits_{k=1}^{n}{w_{k}}}\sum\limits_{j=1}^{n}{w_{j}y_{j}}$ and the computation simplifies to

$$\text{SST} = \sum\limits_{i=1}^{n}{w_{i}y_{i}^{2}} - \frac{1}{\sum\limits_{k=1}^{n}w_{k}} \cdot \left(\sum\limits_{i=1}^{n}{w_{i}y_{i}} \right)^{2}.$$

As a comparison, option 1 from above computes the denominator as $$\text{SST} = \tilde{\boldsymbol{y}}^{T}\tilde{\boldsymbol{y}} - n (\bar{\tilde{\boldsymbol{y}}})^{2} = \sum\limits_{i=1}^{n}{w_{i}y_{i}^{2}} - n \cdot \text{mean}(V^{-1/2}\boldsymbol{y})^{2} = \sum\limits_{i=1}^{n}{w_{i}y_{i}^{2}} - n \left( \frac{1}{n} \sum\limits_{j=1}^{n}\sqrt{w_{j}}y_{j} \right)^{2} = \sum\limits_{i=1}^{n}{w_{i}y_{i}^{2}} - \frac{1}{n} \left( \sum\limits_{j=1}^{n}\sqrt{w_{j}}y_{j} \right)^{2}.$$

So the differences are that weight normalisation is used and the second term has squared weights. The theoretical justification of that is however, not clear to me.

Related Question