Regression – Equality of Two Definitions of the Coefficient of Determination $R^2$

mathematical-statisticsmultiple regressionr-squaredregression

I want to do multiple linear regression as explained on this Wikipedia site: I am given the following data:
$$
yx=(~(y_1,x_{11},\ldots,x_{1p}),\ldots, (y_n,x_{n1},\ldots,x_{np})~)
$$

of $n$-many samples where for each sample $(y_i,x_{i_1},\ldots,x_{ip})$ the variable $y_i\in\mathbb{R}$ is ''dependent'' (the ''dependent variable'') from $x_{i_1},\ldots,x_{ip} \in\mathbb{R}$. We want to find a hyperplane in $\mathbb{R}^{n+1}$ through these points as one does in (multiple/multivariate) linear regression and as it is explained e.g. here on Wikipedia.

How good or bad such a hyperplane describes the points of the sample is measured e.g. by the $R^2$-factor or ''coefficient of determination''. This is, as explained here on Wikipedia, defined as
$$
R^2 = 1-\frac{SS_{res}}{SS_{tot}}
$$

where the values $SS_{res}$ and $SS_{tot}$ depend on the data as explained on the linked Wikipedia-page. (As the notation with the square does not suggest, this may take values in $(-\infty,1]$. If $\bar y=\frac{1}{n}\sum y_i=0$, then this may be equally defined as $R^2=\frac{SS_{reg}}{SS_{tot}}\in[0,1]$.)

My problem is as follows: There is another definition of $R^2$ – call it temporarily $R_2^2$ – which is given by
$$
R^2_2 = (r_{1y},\ldots, r_{py}) \begin{pmatrix} r_{11}=1 & \cdots & r_{1p}\\
\vdots & \ddots & \vdots\\
r_p1 & \cdots & r_{pp}=1
\end{pmatrix}^{-1}\begin{pmatrix}r_{1y}\\\vdots\\r_{py}\end{pmatrix}
$$

where the $(p\times p)$-matrix in the middle is called the correlation matrix with
$$
r_{ij} = \frac{s_{ij}}{s_{i}s_{j}}\quad\quad\text{and}\quad\quad r_{iy} = \frac{s_{iy}}{s_{i}s_{y}}
$$

the respective correlation coefficients with respective sample variances
$$
s_{ij} = \frac{1}{n-1}\sum_{k=1}^n (x_{ki}-\bar{x_{\bullet i}})(x_{kj}-\bar{x_{\bullet j}})\quad\text{and}\quad s_{i} = \sqrt{s_{ii}} \quad\text{and}\quad s_{iy} = \frac{1}{n-1}\sum_{k=1}^n (x_{ki}-\bar{x_{\bullet i}})(y_{k}-\bar{y}) \quad\text{etc.}
$$

Are these numbers $R^2$ and $R^2_2$ the same and, if yes, why? Thank you! (I would prefer a linear-algebra-argument without involving probability theory and distributions, etc. – everything is empirical and given here.)

Best Answer

A good question. I have to confess that I haven't seen the second expression of $R^2$ before (on the other hand, the better-known result that in simple linear regression, $R^2$ is the squared Pearson correlation between the response and the predictor is clearly a corollary of this statement), but here goes the (long) proof:

To clarify, I am assuming you have $p$ regressors (excluding the intercept term), hence in matrix form, the model of your interest (by convention, a linear model contains an intercept $e$ of all ones) is: \begin{align*} Y := \begin{bmatrix} y_1 \\ y_2 \\ \vdots \\ y_n \end{bmatrix} = \begin{bmatrix} e & x_1 & x_2 & \cdots & x_p \end{bmatrix}\beta + \varepsilon := \begin{bmatrix} e & \tilde{X}\end{bmatrix}\begin{bmatrix} \beta_0 \\ \gamma \end{bmatrix} + \varepsilon, \tag{1}\label{1} \end{align*} where $x_j = \begin{bmatrix} x_{1j} & \cdots & x_{nj}\end{bmatrix}^\top$, $j = 1, \ldots, p$, $\beta_0 \in \mathbb{R}$ is the coefficient corresponding to $e$, $\gamma \in \mathbb{R}^p$ is the coefficient corresponding to $\tilde{X} = \begin{bmatrix} x_1 & \cdots & x_p\end{bmatrix} \in \mathbb{R}^{n \times p}$.

The form of your proposed $R_2^2$ inspires me considering the following form of the regression model (known as Standardized Multiple Regression Model, see Applied Linear Statistical Models by Kutner et al., Section 7.5) -- this is how I managed to bring all the Pearson correlations to the party: \begin{align*} Y^* = X^*\beta^* + \varepsilon^*, \tag{2}\label{2} \end{align*} where \begin{align*} & Y^* = \begin{bmatrix} \frac{y_1 - \bar{y}}{\sqrt{n - 1}s_Y} \\ \vdots \\ \frac{y_n - \bar{y}}{\sqrt{n - 1}s_Y} \end{bmatrix} = (\sqrt{n - 1}s_Y)^{-1}(I - n^{-1}ee^\top)Y \in \mathbb{R}^{n \times 1}, \tag{3.1}\label{3.1} \\ & X^* = \begin{bmatrix} \frac{x_1 - \bar{x}_1e}{\sqrt{n - 1}s_1} & \cdots & \frac{x_p - \bar{x}_pe}{\sqrt{n - 1}s_p} \end{bmatrix} = (I - n^{-1}ee^\top)\tilde{X}(\sqrt{n - 1}\Lambda)^{-1} \in \mathbb{R}^{n \times p}, \tag{3.2}\label{3.2} \\ & \bar{y} = \frac{1}{n}\sum_{i = 1}^ny_i, \; s_Y^2 = \frac{1}{n - 1}\sum_{i = 1}^n(y_i - \bar{y})^2, \tag{3.3}\label{3.3} \\ & \bar{x}_j = n^{-1}\sum_{i = 1}^n x_{ij}, \; s_j^2 = \frac{1}{n - 1}\sum_{i = 1}^n(x_{ij} - \bar{x}_j)^2, \; j = 1, \ldots, p, \tag{3.4}\label{3.4} \\ & \Lambda = \operatorname{diag}(s_1, \ldots, s_p) \in \mathbb{R}^{p \times p}. \tag{3.5}\label{3.5} \end{align*} In words, $Y^*$ and $X^*$ are simply standardized versions of original inputs $Y$ and $\tilde{X}$.

With the above notations, on one hand, it is easy to verify that \begin{align*} r_{YX} := \begin{bmatrix} r_{Y, x_1} \\ \vdots \\ r_{Y, x_p} \end{bmatrix} = X^{*\top}Y^*, \quad r_{XX} := \begin{bmatrix} r_{x_i, x_j} \end{bmatrix} = X^{*\top}X^*, \tag{4}\label{4} \end{align*} hence \begin{align*} \hat{\beta^*} = (X^{*\top}X^*)^{-1}X^{*\top}Y^* = r_{XX}^{-1}r_{YX}. \end{align*}

On the other hand, one can express the OLS estimate of $\beta$ in $\eqref{1}$ in terms of the OLS estimate of $\beta^*$ in $\eqref{2}$ as follows (this is easy to verify by substituting transformation definitions $\eqref{3.1}$ -- $\eqref{3.5}$ into $\eqref{2}$ then compare it with $\eqref{1}$, see also the aforementioned reference for derivation details): \begin{align*} & \hat{\gamma} = s_Y\Lambda^{-1}\hat{\beta^*} = s_Y\Lambda^{-1}r_{XX}^{-1}r_{YX}, \tag{5.1}\label{5.1} \\ & \hat{\beta}_0 = \bar{Y} - \begin{bmatrix}\bar{x}_1 & \cdots & \bar{x}_p \end{bmatrix}\hat{\gamma} = n^{-1}e^\top Y - n^{-1}e^\top \tilde{X}\hat{\gamma}. \tag{5.2}\label{5.2} \end{align*}

It then follows by $\eqref{5.1}, \eqref{5.2}, \eqref{3.1}, \eqref{3.2}$ that \begin{align*} & Y - \hat{Y} = Y - e\hat{\beta}_0 - \tilde{X}\hat{\gamma} \\ =& (I - n^{-1}ee^\top)Y - (I - n^{-1}ee^\top)\tilde{X}\hat{\gamma} \\ =& s_Y\sqrt{n - 1}Y^* - s_Y\sqrt{n - 1}X^*r_{XX}^{-1}r_{YX}, \end{align*} whence by $\eqref{4}$ we conclude \begin{align*} & (Y - \hat{Y})^\top(Y - \hat{Y}) \\ =& (n - 1)s_Y^2 Y^{*\top}Y^* - 2(n - 1)s_Y^2Y^{*\top}X^{*}r_{XX}^{-1}r_{YX} + (n - 1)s_Y^2r_{YX}^\top r_{XX}^{-1}X^{*\top}X^*r_{XX}^{-1}r_{YX} \\ =& (n - 1)s_Y^2(Y^{*\top}Y^* - r_{YX}^\top r_{XX}^{-1}r_{YX}). \tag{6}\label{6} \end{align*} It then follows by $\eqref{6}$ and the definition of $R^2$ that \begin{align*} & R^2 = 1 - \frac{(Y - \hat{Y})^\top(Y - \hat{Y})}{Y^\top (I - n^{-1}ee^\top)Y} \\ =& 1 - \frac{(n - 1)s_Y^2(Y^{*\top}Y^* - r_{YX}^\top r_{XX}^{-1}r_{YX})}{(n - 1)s_Y^2 Y^{*\top}Y^*} \\ =& r_{YX}^\top r_{XX}^{-1}r_{YX}. \end{align*} In the last equality, we used the identity $Y^{*\top}Y^* = 1$ (which follows from $\eqref{3.1}$ and $\eqref{3.3}$). This completes the proof.


Addendum

Per OP's request, below is a more detailed derivation of $\eqref{5.1}$ and $\eqref{5.2}$.

By model $\eqref{1}$, we have \begin{align*} \begin{bmatrix} \hat{\beta}_0 \\ \hat{\gamma} \end{bmatrix} = \begin{bmatrix} n & e^\top\tilde{X} \\ \tilde{X}^\top e & \tilde{X}^\top\tilde{X} \end{bmatrix}^{-1} \begin{bmatrix} e^\top Y \\ \tilde{X}^\top Y \end{bmatrix}. \tag{A.1}\label{A.1} \end{align*} Denote the matrix $\tilde{X}^\top(I - n^{-1}ee^\top)\tilde{X}$ by $C$, which by $\eqref{3.2}$ is equal to $(n - 1)\Lambda X^{*\top}X^*\Lambda$. It follows by the block matrix inversion formula that \begin{align*} \begin{bmatrix} n & e^\top\tilde{X} \\ \tilde{X}^\top e & \tilde{X}^\top\tilde{X} \end{bmatrix}^{-1} = \begin{bmatrix} n^{-1} + n^{-2}e^\top\tilde{X}C^{-1}\tilde{X}^\top e & -n^{-1}e^\top\tilde{X}C^{-1} \\ -n^{-1}C^{-1}\tilde{X}^\top e & C^{-1} \end{bmatrix}. \tag{A.2}\label{A.2} \end{align*}

Substituting $\eqref{A.2}$ into $\eqref{A.1}$ and using the idempotency of the matrix $I - n^{-1}ee^\top$ (also plugging definitions $\eqref{3.1}$ and $\eqref{3.2}$) then give \begin{align*} & \hat{\gamma} = -n^{-1}C^{-1}\tilde{X}^\top ee^\top Y + C^{-1}\tilde{X}^\top Y \\ =& C^{-1}\tilde{X}^\top(I - n^{-1}ee^\top)Y \\ =& C^{-1}((I - n^{-1}ee^\top)\tilde{X})^\top (I - n^{-1}ee^\top)Y \\ =& (n - 1)^{-1}\Lambda^{-1}(X^{*\top}X^*)^{-1}\Lambda^{-1}(\sqrt{n - 1}X^*\Lambda)^\top \sqrt{n - 1}s_YY^* \\ =& s_Y\Lambda^{-1}(X^{*\top}X^*)^{-1}X^{*\top}Y^* \\ =& s_Y\Lambda^{-1}\hat{\beta^*}. \\[1em] & \hat{\beta}_0 = n^{-1}e^\top Y + n^{-2}e^\top \tilde{X}C^{-1}\tilde{X}^\top ee^\top Y - n^{-1}e^\top\tilde{X}C^{-1}\tilde{X}^\top Y \\ =& n^{-1}e^\top Y - n^{-1}e^\top\tilde{X}(-n^{-1}C^{-1}\tilde{X}^\top ee^\top Y + C^{-1}\tilde{X}^\top Y) \\ =& n^{-1}e^\top Y - n^{-1}e^\top\tilde{X}\hat{\gamma}, \end{align*} which are exactly $\eqref{5.1}$ and $\eqref{5.2}$.