Solved – Can the coefficient of determination $R^2$ be more than one? What is its upper bound

linear modelmultiple regressionr-squaredregression

It is well known that if you add additional independent variables in a linear regression, the $R^2$ of the new model is at least as large as the previous model, so you obtain a lower bound for the $R^2$. I was thinking about the other scenario: how does the upper bound change?

Let's say you run three linear regressions, for $i=1,2,\cdots, N$:

\begin{align}
y_i &= a_1 + b_1x_{i} +e_{i,1} \tag{regression 1} \\
y_i &= a_2 + b_2z_{i} +e_{i,2} \tag{regression 2} \\
y_i &= a + bx_i + cz_i + e_i \tag{regression 3} \\
\end{align}

Assume the coefficients of determination are $R^2_1$, $R^2_2$, and $R^2$, for the 1st, 2nd, and 3rd linear regression, respectively. I want to find an upper bound for $R^2$ in terms of $R^2_1$ and $R^2_2$.

I found that $R^2\leq \frac{R^2_1+R^2_2}{2}$, but I think this is wrong because if $R^2_1=0.3$ and $R^2_2=0.7$, then you have that $R^2\leq 0.5$, but on the other hand $R^2\geq \max(R^2_1,R^2_2)=0.7$.

Attempted proof: By definition $R^2= SSreg/SStotal$. Since $SStotal$ is the same for all the three regressions, I will work only with $SSreg$. Summing both sides of the equations of regression 1 and 2 and comparing with the coefficients of regression 3, we find that $a = \frac{a_1+a_2}{2}$, $b = \frac{b_1}{2}$, $c = \frac{b_2}{2}$. Using the Cauchy-Schwarz Inequality we obtain:

\begin{align}
SSreg &= \sum_{i=1}^{N} (\hat{a}+\hat{b}x_i +\hat{c}z_i – \bar{y})^2 \\
&= \sum_{i=1}^{N} \left(\frac{1}{2}(\hat{a_1}+\hat{b_1}x_i – \bar{y})+ \frac{1}{2}(\hat{a_2}+\hat{b_2}z_i-\bar{y})\right)^2 \\
&\leq 2 \left(\sum_{i=1}^{N} \frac{1}{4}(\hat{a_1}+\hat{b_1}x_i – \bar{y})^2+\sum_{i=1}^{N} \frac{1}{4}(\hat{a_1}+\hat{b_2}z_i – \bar{y})^2\right) \\
SSreg &\leq \frac{1}{2} (SSreg_1 +SSreg_2)
\end{align}
Dividing both sides by $SStotal$ we obtain $R^2 \leq \frac{1}{2} (R^2_1+R^2_2)$.

Any comments on what is wrong with this "proof"? My understanding is that the problem stems from the fact when adding up the error terms, there might be some correlation between $e_{i,1}$ and $e_{i,2}$. Any feedback or book/article reference is appreciated.

Best Answer

The best upper bound is $1$, no matter what the values of $R_1^2$ and $R_2^2$ may be.

The following discussion explains why, in three increasingly detailed ways. The first explanation gives geometric intuition, leading to a simple example. The second one translates that into a procedure to generate specific datasets that give rise to this example. The third one generalizes this procedure to show how any mathematically possible value of $R^2$ can be achieved, given arbitrary values of $R_1^2$ and $R_2^2$.

I adopt a notation in which the independent variables are named $x_1$ and $x_2$ (rather than $x$ and $z$), so that the distinction between the independent and dependent variables remains clear.

(A comment by the alert @f coppens compels me to add that these results change when one or more of the regressions does not include a constant term, because then the relationship between $R^2$ and the correlation coefficients changes. The methods used to obtain these results continue to work. Interested readers may enjoy deriving a more general answer for that situation.)


For the simple regressions (1) and (2), the $R_i^2$ are the squares of the correlation coefficients between $x_i$ and $y$. Relationships among correlation coefficients are just angular relationships among unit vectors in disguise, because the correlation coefficient of two variables $x$ and $y$ (considered as column $n$-vectors) is the dot product of their normalized (unit-length) versions, which in turn is the cosine of the angle between them.

In these geometric terms, the question asks

How close can a vector $y$ come to the plane generated by $x_1$ and $x_2$, given the angles between $y$ and the $x_i$?

Evidently $y$ can actually be in that plane, provided you put $y$ at a given angle $\theta_1$ with $x_1$ and then place $x_2$ at a given angle $\theta_2$ with $y$. When that happens, the $R^2$ for regression (3) is $1$, demonstrating there is no meaningful upper bound on $R^2$.


Geometric thinking is no longer considered rigorous, but it leads us to a rigorous example. Start with two orthogonal unit vectors $u$ and $v$, each of which is orthogonal to a vector of ones (so that we can accommodate a constant term in all three regressions). Given $R_1^2$ and $R_2^2$, let $\rho_i^2 = R_i^2$ be choices of their square roots. To place vectors $x_1$, $y$, and $x_2$ at the required angles, set

$$\eqalign{ &x_1 &= u\\&y&=\rho_1 u + \sqrt{1-\rho_1^2} v\\ &x_2 &= (\rho_1\rho_2-\sqrt{1-\rho_1^2}\sqrt{1-\rho_2^2})u + (\rho_1\sqrt{1-\rho_2^2}\sqrt{1-\rho_1^2})v.}$$

Since $u\cdot u = v\cdot v = 1$ and $u\cdot v = 0$, you can verify that $x_2\cdot x_2 = 1$ as required,

$$y\cdot x_1 = \rho_1,$$

and

$$\eqalign{ y\cdot x_2 &= \rho_1\left(\rho_1\rho_2-\sqrt{1-\rho_1^2}\sqrt{1-\rho_2^2}\right) + \sqrt{1-\rho_1^2}\left(\rho_1\sqrt{1-\rho_2^2}\sqrt{1-\rho_1^2}\right) \\ &= \rho_2,}$$

as intended.

For a completely concrete example with $n\ge 3$ observations, start with any two $n$-vectors $u_0$ and $v_0$ which are linearly independent and linearly independent of the $n$-vector $\mathbf{1}=(1,1\ldots, 1)$. Apply the Gram-Schmidt process to the sequence $\mathbf{1}, u_0, v_0$ to produce an orthonormal basis $\mathbf{1}/\sqrt{n}, u, v$. Use the $u$ and $v$ that result. For instance, for $n=3$ you might start with $u_0 = (1,0,0)$ and $v_0=(0,1,0)$. The Gram-Schmidt orthogonalization of them yields $u = (2,-1,-1)/\sqrt{6}$ and $v=(0,1,-1)/\sqrt{2})$. Apply the preceding formulas to these for any given $R_1^2$ and $R_2^2$ you desire. This will result in a dataset consisting of the $3$-vectors $x_1$, $x_2$, and $y$ with the specified values of $R_1^2, R_2^2$, and $R^2 = 1$.


A similar approach, starting with mutually orthonormal vectors $u_0, v_0, w_0$, can be used to construct examples in which $R^2$ achieves any specified value in the interval $[\max(R_1^2, R_2^2), 1]$. Order the $x_i$ so that $R_1^2 \ge R_2^2$. Writing $y = \alpha u_0 + \beta v_0 + \gamma w_0$, $x_1 = u_0$, and $x_2 = \rho_{12}u_0 + \sqrt{1-\rho_{12}^2}v_0$, compute that $\rho_1 = \alpha$ and $\rho_2 = \alpha \rho_{12} + \beta \sqrt{1-\rho_{12}^2}$. From this, and the fact that $\alpha^2+\beta^2+\gamma^2=1$, solve and find that

$$\beta = \frac{\rho_2 - \rho_1\rho_{12}}{\sqrt{1-\rho_{12}^2}}$$

and $\gamma = \sqrt{1-\alpha^2 - \beta^2}$. For this square root to exist, $\beta$ needs to be small, but that can be guaranteed by choosing $\rho_{12}$ (the correlation between the two independent variables $x_1$ and $x_2$) to be small in size, because as $\rho_{12}$ approaches $\rho_2/\rho_1$, (which is possible because the absolute value of this ratio does not exceed $1$), $\beta$ approaches zero continuously.

The cognoscenti will recognize the relationship between the formula for $\beta$ and a certain partial correlation coefficient.