Solved – Hypothesis test for a linear combination of coefficients $c_0\beta_0 +c_1\beta_1$

hypothesis testingregressionself-study

I'm given a sime linear regression model $$y_i=\beta_0+\beta_1x_i+\epsilon_i$$

where $\epsilon_i\sim N(0,\sigma)$, and I have to create a hypothesis test to contrast

$$H_0: c_0\beta_0+c_1\beta_1=h$$

where $c_0,c_1, h$ are given constants.

My try:

We know that the estimator $\hat{\beta_0}\sim N\left(\beta_0,\sigma^2(\frac{1}{n}+\frac{\bar{x}^2}{S_{xx}})\right)$ and that the estimator $\hat{\beta_1}\sim N\left(\beta_1,\frac{\sigma^2}{S_{xx}}\right)$,so the combination $c_0\beta_0+c_1\beta_1$ follows a normal distribution (since it is a linear combinatin of normal distributions) with mean

$$\mu=c_0\beta_0+c_1\beta_1$$

and variance

$$Var[c_0\hat{\beta_0}+c_1\hat{\beta_1}]$$

I don't see any reason why $\hat{\beta_0}$ and $\hat{\beta_1}$ must be uncorralated, so I can't say that the variance of the sum is the sum of the variances. Then
$$Var[c_0\hat{\beta_0}+c_1\hat{\beta_1}]=Var[c_0\hat{\beta_0}]+Var[c_1\hat{\beta_1}]+2Cov(c_0\hat{\beta_0},c_1\hat{\beta_1})$$

Now

$$Cov(c_0\hat{\beta_0},c_1\hat{\beta_1})=c_0c_1Cov(\bar y-\hat{\beta_1}\bar x,\hat{\beta_1})=c_0c_1\left(Cov(\bar y, \hat{\beta_1})-\bar xCov(\hat{\beta_1},\hat{\beta_1})\right)$$

But $Cov(\bar y, \hat{\beta_1})=0$ and $Cov(\hat{\beta_1},\hat{\beta_1})=Var[\hat{\beta_1}]$ so

$$Var[c_0\hat{\beta_0}+c_1\hat{\beta_1}]=c_0^2\sigma^2\left(\frac{1}{n}+\frac{\bar x^2}{S_{xx}}\right)+c_1^2\frac{\sigma^2}{S_{xx}}-2c_0c_1\bar x \frac{\sigma^2}{S_{xx}}$$

If I haven't made any mistakes, that should be the expression for the variance, but I can't simplify this in order to get a $N(0,1)$ distribution.

My question is then if there is a simpler way to get the hypothesis test, so any hint that points me in the right direction would be highly appreciate.

Note: I have seen a similar question asked here, but the answer involves matrix notation, and that is something we haven't covered in class yet.

Best Answer

Straightforward method

If you'd covered a bit more you'd be there. Start by assuming the null: $$ h=c_0\beta_0+c_1\beta_1 $$ then, as you've pointed out $c_0\hat\beta_0+c_1\hat\beta_1-h\sim\mathcal{N}(0,V)$ where $V$ is some unknown variance. So you need to find $V$. The equation you've started with is also correct: $$ V=\text{Var}[c_0\hat\beta_0+c_1\hat\beta_1-h] = c_0^2\text{Var}[\hat\beta_0]+c_1^2\text{Var}[\hat\beta_1] +2c_0c_1\text{Cov}(\hat\beta_1,\hat\beta_2) $$

To complete this straightforward method, one thing you could use is the variance-covariance matrix of regressors, which I'm guessing you haven't covered yet (if you have this should be straightforward, the covariance you need is just an entry). Alternatively, you can derive $\text{Cov}(\hat\beta_0,\hat\beta_1)$. To derive $\text{Cov}(\hat\beta_0,\hat\beta_1)$, note that:

$$ \hat\beta_1 = \dfrac{\sum_{i=1}^N x_i-\bar x}{\sum_{i=1}^N (x_i-\bar x)^2}y_i \quad \text{and} \quad \hat\beta_0 = \frac{1}{n}\sum_{i=1}^N y_i -\hat\beta_1 \bar x $$ and figure out where you can go from there given the assumptions you've been handed. Unfortunately, you can't assume that $\text{Cov}(\bar y, \hat\beta_1)=0$. $\text{Cov}(\bar y, \hat\beta_1)$ need not be zero, as both are estimates, so their covariance may in general be non-zero. If you can justify why they should be 0, then you're done. One way to approach this would be to think about what would happen if you ran this regression on demeaned data, and go from there to a formal argument.

In the future you'll also learn it's much easier to do this kind of restriction with a Wald test or an $F$ test. This method, once you've used the variance-covariance matrix, is algebraically equivalent to a Wald test and asymptotically equivalent to an $F$ test.

Tricky method may be easier

Right now you're model is:

$$ y_i = \beta_0 + \beta_1 x_i + \epsilon $$

Under the null, remember that $\beta_1- (h-c_0\beta_0)/c_1=0$. Try manipulating the regression equation, and redefining the variables in your regression by adding and subtracting things to the regression equation that are equal, such that you can test this exact restriction as a $t$ test on a slightly different regression formed with the same data and equivalent to your original model. While you're doing this, remember that $h$, $c_0$ and $c_1$ are known constants, not random variables. Without giving you the answer, that's a trick that could make your life easier.

The method you will arrive at if you use this trick right is also algebraically equivalent to a Wald test.