Partial regression coefficient calculated in two different ways

regressionstatistics

Consider observations on three variables $X_1,X_2$ and $X_3$ : Suppose that $X_1$ is regressed on $X_2$ : When the residual of the above regression is regressed on $X_3$; the regression coefficient of $X_3$ is $\beta_3$ : When $X_1$ is regressed on $X_2$ and $X_3$ simultaneously, the regression coefficient of $X_3$ is ${\beta_3}^{*}$. Show that $|\beta_3|\le|\beta_3^{*}|$.

The expressions are simpler if we use linear regression , still I cannot establish this result.
This problem looks quite interesting.
Please feel free to share your approach!
Thanks!

Best Answer

To make things easier, I will use $X,Y,Z$ in place of $X_1,X_2,X_3$, and assume the intercept is 0 (along with some other assumptions). The ideas should extend to more general cases.

We are given regressions:

(1) $X=aY+U$, with residual $U$.

(2) $U = X-aY = bZ+V$, with residual $V$.

(3) $X = cY+dZ+W$, with residual $W$.

We'd like to show $|b|\le|d|$.

Rewrite (2) as:

(4) $X = aY+bZ+V$

Compare (4) and (3): since for a reasonably regression, $(c,d)$ should minimize $Var(W)$, we have

(5) $Var(W)\le Var(V)$.

Similarly, compare (1) and (3): since $a$ minimizes $Var(U)$, replacing $U$ with $bZ+V$ from (2), we have

(6) $Var(bZ+V)\le Var(dZ+W)$.

Since for a reasonable regression, we have $Cov(Z,V)=Cov(Z,W)=0$ (otherwise we would have correlation unaccounted for by the coefficients) we can deduce from (6):

(7) $b^2Var(Z)+Var(V) \le d^2 Var(Z)+Var(W)$

With (5) we arrive at the desired result.

Related Solutions

Regression on trivariate data with one coefficient 0

Your approach is correct.

By differentiating with respect to $\beta_2$, we can see that at the optimal value, we must have

$$\hat{\beta}_2 = y_n -\hat{\beta_1}x_n-\hat{\beta_0}$$

That is the last term of the objective function must vanish.

Hence the problem to solve for $\hat{\beta_0}$ and $\hat{\beta_1}$ is the same as minimizing

$$\sum_{i=1}^{n-1} (y_i-\beta_0-\beta_1 x_i)^2$$

Hence, we know that $\hat{\beta_1}=\hat{\alpha_1}$ and furthermore, $\hat{\beta_0}=\hat{\alpha_0}$.

The Infamous $E[\max X_i| X_1 < X_2 < X_3] $ Solution

Let $\lambda_i = \lambda,$ for $i = 1,2,3.$ In this special case, $E(\max_i(X_i)),$ for $X_i \stackrel{iid}{\sim} \mathsf{Exp}(rate = \lambda)$ can be found as follows:

Consider the $X_i$ to be times to failure of three devices. The time to failure of the first device is $\min_i(X_i) = X_{(1)} \sim \mathsf{Exp}(3\lambda),$ with $E(X_{(1)}) = 1/3\lambda.$

Then, by the no-memory property, the additional time to failure of the second device is $D_2 = X_{(2)}-X_{(1)} \sim \mathsf{Exp}(2\lambda),$ with $E(D_2) = 1/2\lambda.$ This is the average minimum time to failure of remaining two devices.

Similarly, the additional time to failure $D_3$ of the (single remaining) third device has $E(D_2) = 1/\lambda.$

Thus the total expected time to failure of the third device is $E(\max(X_i)) = E(X_{(3)}) = 1/3\lambda + 1/2\lambda + 1/\lambda.$

This method cannot be used for the general case in which the rates are unequal because we don't know which devices will fail first and second.

However, with the condition that $X_1 < X_2 < X_3,$ we do know the order of failure, so the conditional expected time to failure can be found as in the solution attached to the question.

Simulation in R for max and min with $\lambda=2:$

set.seed(728);  m=10^6;  lam = 2
x1 = rexp(m,lam);  x2 = rexp(m,lam);  x3 = rexp(m,lam)
v = pmin(x1, x2, x3)
mean(v)
[1] 0.1664693        # aprx E(min) = 1/6
w = pmax(x1, x2, x3)
mean(w)  
[1] 0.9167773        # aprx E(max) = 11/12
1/(3*lam) + 1/(2*lam) + 1/lam
[1] 0.9166667        # 11/12

Best Answer

Related Solutions

Regression on trivariate data with one coefficient 0

The Infamous $E[\max X_i| X_1 < X_2 < X_3] $ Solution

Related Question